To see the other types of publications on this topic, follow the link: 2D/3D object discovery.

Dissertations / Theses on the topic '2D/3D object discovery'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 44 dissertations / theses for your research on the topic '2D/3D object discovery.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Kara, Sandra. "Unsupervised object discovery in images and video data." Electronic Thesis or Diss., université Paris-Saclay, 2025. http://www.theses.fr/2025UPASG019.

Full text
Abstract:
Cette thèse explore les méthodes d'apprentissage auto-supervisé pour la localisation d'objets, communément appelées « Object Discovery ». La localisation d'objets dans les images et les vidéos est un élément essentiel des tâches de vision par ordinateur telles que la détection, la ré-identification, le suivi, etc. Les algorithmes supervisés actuels peuvent localiser (et classifier) les objets avec précision, mais ils sont coûteux en raison de la nécessité de données annotées. Le processus d'étiquetage est généralement répété pour chaque nouvelle donnée ou catégorie d'intérêt, limitant ainsi leur évolutivité. De plus, les approches sémantiquement spécialisées nécessitent une connaissance préalable des classes cibles, restreignant leur utilisation aux objets connus. La découverte d'objets vise à pallier ces limitations en étant plus générique. La première contribution de la thèse s'est concentrée sur la modalité image, en étudiant comment les caractéristiques des modèles transformers de vision auto-supervisés peuvent servir d'indices pour la découverte d'objets multiples. Afin de localiser les objets dans leur définition la plus large, nous avons étendu notre étude aux données vidéo, en exploitant les indices de mouvement et en ciblant la localisation d'objets capables de se déplacer. Nous avons introduit la modélisation de l'arrière-plan et la distillation de connaissances dans la découverte d'objets pour résoudre le problème de la sur-segmentation de l'arrière-plan dans les méthodes existantes, et pour réintégrer les objets statiques, améliorant ainsi de manière significative le rapport signal/bruit dans les prédictions. Reconnaissant les limites des données à modalité unique, nous avons incorporé des données 3D à travers un apprentissage par distillation de connaissances cross-modale. L'échange de connaissances entre les domaines 2D et 3D a permis d'améliorer l'alignement des régions d'objets entre les deux modalités, rendant possible l'utilisation de la cohérence multi-modale comme critère de confiance
This thesis explores self-supervised learning methods for object localization, commonly known as Object Discovery. Object localization in images and videos is an essential component of computer vision tasks such as detection, re-identification, tracking etc. Current supervised algorithms can localize (and classify) objects accurately but are costly due to the need for annotated data. The process of labeling is typically repeated for each new data or category of interest, limiting their scalability. Additionally, the semantically specialized approaches require prior knowledge of the target classes, restricting their use to known objects. Object Discovery aims to address these limitations by being more generic. The first contribution of this thesis focused on the image modality, investigating how features from self-supervised vision transformers can serve as cues for multi-object discovery. To localize objects in their broadest definition, we extended our focus to video data, leveraging motion cues and targeting the localization of objects that can move. We introduced background modeling and knowledge distillation in object discovery to tackle the background over-segmentation issue in existing object discovery methods and to reintegrate static objects, significantly improving the signal-to-noise ratio in predictions. Recognizing the limitations of single-modality data, we incorporated 3D data through a cross-modal distillation framework. The knowledge exchange between 2D and 3D domains improved alignment on object regions between the two modalities, enabling the use of multi-modal consistency as a confidence criterion
APA, Harvard, Vancouver, ISO, and other styles
2

Shao, Zhimin. "3D/2D object recognition from surface patterns." Thesis, University of Surrey, 1997. http://epubs.surrey.ac.uk/844055/.

Full text
Abstract:
Attributed Relational Graph (ARG) is a powerful representation for model based object recognition due to its inherent robustness in handling noisy and incomplete data. In the past few years, the availability of efficient ARG matching algorithms and their theoretical underpinnings have greatly contributed to many successful applications of ARG representation in tackling high level vision problems. During my past three year investigation into object recognition using ARG representation, we have developed a number of novel theories and techniques in the subject area. Some are image processing techniques which help to segment and generate primitive features for building ARG representation (Chapter 2 and 4). Some are about projective invariance in ARG representations (Chapter 3 and 5). Some are about new ARG matching algorithms (Chapter 6). This thesis serves as a summary document of these theories and techniques. The most important contributions of our work to the domain of computer vision, in my opinion, are in two areas: Firstly, in the area of projective invariant ARG representation for object recognition. Here, we demonstrated for the first time, a way to systematically derive ARG representation for objects under complex projective transform by exploiting the knowledge of invariance. The methodology developed by us is a sound strategy that generates ARG representations with a number of desirable and provable properties, amongst which, the most important one is the ability to capture global transformation constraint using binary relations only. The approach significantly reduces the heuristic nature of designing relational measurements and paves the way for wider application of ARG representation in 2D and 3D object recognition. Secondly, in the area of ARG matching. A new mathematical framework for deterministic relaxation algorithms was developed to overcome a number of problems appeared in the existing theories and practises of efficient ARG labelling. A novel labelling algorithm was proposed based on the new theoretical framework. The algorithm has a number of desirable properties compared to existing algorithms. In particular, the resulting algorithm delivers more consistent, faithful-to-observation results in the presence of ambiguities and multiple interpretations compared to other algorithms.
APA, Harvard, Vancouver, ISO, and other styles
3

Sirtkaya, Salim. "Moving Object Detction In 2d And 3d Scenes." Master's thesis, METU, 2004. http://etd.lib.metu.edu.tr/upload/2/12605310/index.pdf.

Full text
Abstract:
This thesis describes the theoretical bases, development and testing of an integrated moving object detection framework in 2D and 3D scenes. The detection problem is analyzed in stationary and non-stationary camera sequences and different algorithms are developed for each case. Two methods are proposed in stationary camera sequences: background extraction followed by differencing and thresholding, and motion detection using optical flow field calculated by &ldquo
Kanade-Lucas Feature Tracker&rdquo
. For non-stationary camera sequences, different algorithms are developed based on the scene structure and camera motion characteristics. In planar scenes where the scene is flat or distant from the camera and/or when camera makes rotations only, a method is proposed that uses 2D parametric registration based on affine parameters of the dominant plane for independently moving object detection. A modified version of the 2D parametric registration approach is used when the scene is not planar but consists of a few number of planes at different depths, and camera makes translational motion. Optical flow field segmentation and sequential registration are the key points for this case. For 3D scenes, where the depth variation within the scene is high, a parallax rigidity based approach is developed for moving object detection. All these algorithms are integrated to form a unified independently moving object detector that works in stationary and non-stationary camera sequences and with different scene and camera motion structures. Optical flow field estimation and segmentation is used for this purpose.
APA, Harvard, Vancouver, ISO, and other styles
4

Toth, Levente. "3D object recognition based on constrained 2D views." Thesis, University of Plymouth, 1998. http://hdl.handle.net/10026.1/1808.

Full text
Abstract:
The aim of the present work was to build a novel 3D object recognition system capable of classifying man-made and natural objects based on single 2D views. The approach to this problem has been one motivated by recent theories on biological vision and multiresolution analysis. The project's objectives were the implementation of a system that is able to deal with simple 3D scenes and constitutes an engineering solution to the problem of 3D object recognition, allowing the proposed recognition system to operate in a practically acceptable time frame. The developed system takes further the work on automatic classification of marine phytoplanktons, carried out at the Centre for Intelligent Systems, University of Plymouth. The thesis discusses the main theoretical issues that prompted the fundamental system design options. The principles and the implementation of the coarse data channels used in the system are described. A new multiresolution representation of 2D views is presented, which provides the classifier module of the system with coarse-coded descriptions of the scale-space distribution of potentially interesting features. A multiresolution analysis-based mechanism is proposed, which directs the system's attention towards potentially salient features. Unsupervised similarity-based feature grouping is introduced, which is used in coarse data channels to yield feature signatures that are not spatially coherent and provide the classifier module with salient descriptions of object views. A simple texture descriptor is described, which is based on properties of a special wavelet transform. The system has been tested on computer-generated and natural image data sets, in conditions where the inter-object similarity was monitored and quantitatively assessed by human subjects, or the analysed objects were very similar and their discrimination constituted a difficult task even for human experts. The validity of the above described approaches has been proven. The studies conducted with various statistical and artificial neural network-based classifiers have shown that the system is able to perform well in all of the above mentioned situations. These investigations also made possible to take further and generalise a number of important conclusions drawn during previous work carried out in the field of 2D shape (plankton) recognition, regarding the behaviour of multiple coarse data channels-based pattern recognition systems and various classifier architectures. The system possesses the ability of dealing with difficult field-collected images of objects and the techniques employed by its component modules make possible its extension to the domain of complex multiple-object 3D scene recognition. The system is expected to find immediate applicability in the field of marine biota classification.
APA, Harvard, Vancouver, ISO, and other styles
5

Govender, Natasha. "Active object recognition for 2D and 3D applications." Doctoral thesis, University of Cape Town, 2015. http://hdl.handle.net/11427/16520.

Full text
Abstract:
Includes bibliographical references
Active object recognition provides a mechanism for selecting informative viewpoints to complete recognition tasks as quickly and accurately as possible. One can manipulate the position of the camera or the object of interest to obtain more useful information. This approach can improve the computational efficiency of the recognition task by only processing viewpoints selected based on the amount of relevant information they contain. Active object recognition methods are based around how to select the next best viewpoint and the integration of the extracted information. Most active recognition methods do not use local interest points which have been shown to work well in other recognition tasks and are tested on images containing a single object with no occlusions or clutter. In this thesis we investigate using local interest points (SIFT) in probabilistic and non-probabilistic settings for active single and multiple object and viewpoint/pose recognition. Test images used contain objects that are occluded and occur in significant clutter. Visually similar objects are also included in our dataset. Initially we introduce a non-probabilistic 3D active object recognition system which consists of a mechanism for selecting the next best viewpoint and an integration strategy to provide feedback to the system. A novel approach to weighting the uniqueness of features extracted is presented, using a vocabulary tree data structure. This process is then used to determine the next best viewpoint by selecting the one with the highest number of unique features. A Bayesian framework uses the modified statistics from the vocabulary structure to update the system's confidence in the identity of the object. New test images are only captured when the belief hypothesis is below a predefined threshold. This vocabulary tree method is tested against randomly selecting the next viewpoint and a state-of-the-art active object recognition method by Kootstra et al.. Our approach outperforms both methods by correctly recognizing more objects with less computational expense. This vocabulary tree method is extended for use in a probabilistic setting to improve the object recognition accuracy. We introduce Bayesian approaches for object recognition and object and pose recognition. Three likelihood models are introduced which incorporate various parameters and levels of complexity. The occlusion model, which includes geometric information and variables that cater for the background distribution and occlusion, correctly recognizes all objects on our challenging database. This probabilistic approach is further extended for recognizing multiple objects and poses in a test images. We show through experiments that this model can recognize multiple objects which occur in close proximity to distractor objects. Our viewpoint selection strategy is also extended to the multiple object application and performs well when compared to randomly selecting the next viewpoint, the activation model and mutual information. We also study the impact of using active vision for shape recognition. Fourier descriptors are used as input to our shape recognition system with mutual information as the active vision component. We build multinomial and Gaussian distributions using this information, which correctly recognizes a sequence of objects. We demonstrate the effectiveness of active vision in object recognition systems. We show that even in different recognition applications using different low level inputs, incorporating active vision improves the overall accuracy and decreases the computational expense of object recognition systems.
APA, Harvard, Vancouver, ISO, and other styles
6

Noé, Estelle. "3D layered articulated object from a single 2D drawing." Thesis, KTH, Medieteknik och interaktionsdesign, MID, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-216943.

Full text
Abstract:
Modeling articulated objects made of rigid layered parts used to populate 3D scenes in video games or movie production is a complex and time-consuming task for digital artists. This work proposes a sketch-based approach to efficiently model 3D layered articulated objects, such as animals with rigid shells and armors, in annotating a single 2D photo manually, and eventually fabricate it from automatically computed 2D patterns. In considering symmetrical objects seen under a 3/4 view, and an- notating salient features such as extremities of the rigid articulated parts as a mix of circular and Bézier curve, this approach is able to retrieve depth information, hidden parts, and rotation-articulated structure. The resulting shape consists of a set of quadrangulated polygons that may be flattened in 2D. Details such as ears, tails, and legs were further models using dedicated annotations. The accuracy of the reconstruction has been validated on synthetic cylindrical examples, and its ro- bustness in reconstructing a 3D model of armor, armadillo, and shrimp. The latter was finally fabricated using paper.
Att modellera artikulerade objekt gjorda av styva delar lagda i lager som används till att fylla 3D-scener i datorspel och filmskapande är en komplex och tidsödande uppgift för digitala konstnärer. Den här undersökningen föreslår ett skiss-baserat tillvägagångssätt att effektivt modellera artikulerade 3D-objekt lagda i lager, såsom djur med styva skal och rustning, i att annotera ett 2D-foto manuellt, och eventuellt skapa det från automatiskt beräknade 2D-mönster. Hänsyn är tagen till symmetriska objekt sedda under en 3/4 vy, och annotera framträdande egenskapersåsom extremiteter av de styva artikulerade delarna som en blandning avcirkulära och Bézier-kurvor, kan det här tillvägagångssättet hämta information om djup, gömda delar och rotations-artikulerade strukturer. Den slutliga formen består av ett set av fyrsidiga polygoner som kan bli tillplattade i 2D. Detaljer såsom öron, svansar och ben där framtida modeller använder dedikerade annotationer. Noggrannheten av rekonstruktionen har blivit validerad på syntetiska cylindriska exempeloch dess robusthet i att rekonstruera en 3D-modell av en rustning, ett bältdjur och en räka. Den senare skapades slutligen med hjälp av papper.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhu, Yonggen. "Feature extraction and 2D/3D object recognition using geometric invariants." Thesis, King's College London (University of London), 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.362731.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gamal, Eldin Ahmed. "Point process and graph cut applied to 2D and 3D object extraction." Nice, 2011. http://www.theses.fr/2011NICE4107.

Full text
Abstract:
L’objectif de cette thèse est de développer une nouvelle approche de détection d’objets 3D à partir d’une image 2D, prenant en compte les occultations et les phénomènes de perspective. Cette approche est fondée sur la théorie des processus ponctuels marqués, qui a fait ses preuves dans la solution de plusieurs problèmes en imagerie haute résolution. Le travail de la thèse est structuré en deux parties. Dans la première partie, nous proposons une nouvelle méthode probabiliste pour gérer les occultations et les effets de perspective. Le modèle proposé est fondé sur la simulation d’une scène 3D utilisant OpenGL sur une carte graphique (GPU). C’est une méthode orientée objet, intégrée dans le cadre d’un processus ponctuel marqué. Nous l’appliquons pour l’estimation de la taille d’une colonie de manchots, là où nous modélisons certaines configurations candidat composé d’objet 3D s’appuyant sur le plan réel. Une densité de Gibbs est définie sur l’espace des configurations, qui prend en compte des informations a priori et sur les données. Pour une configuration proposée, la scène est projetée sur le plan image, et les configurations sont modifiées jusqu’à convergence. Pour évaluer une configuration proposée, nous mesurons la similarité entre l’image projetée de la configuration projetée et l’image réelle, définissant ainsi le terme d’attache aux données et l’a priori pénalisant les recouvrements entre objets. Nous avons introduit des modifications dans l’algorithme d’optimisation pour prendre en compte les nouvelles dépendances qui existent dans notre modèle 3D. Nous proposons une nouvelle méthode d’optimisation appelée « Naissances et Coupe multiples » (Multiple Births and Cut » (MBC) en anglais). Cette méthode combine à la fois la nouvelle méthode d’optimisation « Naissance et mort multiples » (MBD) et les « Graph-Cut ». Les méthodes MBC et MBD sont utilisées pour l’optimisation d’un processus ponctuel marqué. Nous avons comparé les algorithmes MBC et MBD montrant que les principaux avantages de notre algorithme nouvellement proposé sont la réduction du nombre de paramètres, la vitesse de convergence et de la qualité des résultats obtenus. Nous avons validé notre algorithme sur le problème de dénombrement des flamants roses dans une colonie
The topic of this thesis is to develop a novel approach for 3D object detection from a 2D image. This approach takes into consideration the occlusions and the perspective effects. This work has been embedded in a marked point process framework, proved to be efficient for solving many challenging problems dealing with high resolution images. The accomplished work during the thesis can be presented in two parts : In the first part, we propose a novel probabilistic approach to handle occlusions and perspective effects. The proposed method is based on 3D scene simulation on the GPU using OpenGL. It is an object based method embedded in a marked point process framework. We apply it for the size estimation of a penguin colony, where we model a penguin colony as an unknown number of 3D objects. The main idea of the proposed approach is to sample some candidate configurations consisting of 3D objects lying on the real plane. A Gibbs energy is define on the configuration space, which takes into account both prior and data information. The proposed configurations are projected onto the image plane, and the configurations are modified until convergence. To evaluate a proposed configuration, we measure the similarity between the projected image of the proposed configuration and the real image, by defining a data term and a prior term which penalize objects overlapping. We introduced modifications to the optimization algorithm to take into account new dependencies that exists in our 3D model. In the second part, we propose a new optimization method which we call “Multiple Births and Cut” (MBC). It combines the recently developed optimization algorithm Multiple Births and Deaths (MBD) and the Graph-Cut. MBD and MBC optimization methods are applied for the optimization of a marked point process. We compared the MBC to the MBD algorithms showing that the main advantage of our newly proposed algorithm is the reduction of the number of parameters, the speed of convergence and the quality of the obtained results. We validated our algorithm on the counting problem of flamingos in a colony
APA, Harvard, Vancouver, ISO, and other styles
9

Gomez-Donoso, Francisco. "Contributions to 3D object recognition and 3D hand pose estimation using deep learning techniques." Doctoral thesis, Universidad de Alicante, 2020. http://hdl.handle.net/10045/110658.

Full text
Abstract:
In this thesis, a study of two blooming fields in the artificial intelligence topic is carried out. The first part of the present document is about 3D object recognition methods. Object recognition in general is about providing the ability to understand what objects appears in the input data of an intelligent system. Any robot, from industrial robots to social robots, could benefit of such capability to improve its performance and carry out high level tasks. In fact, this topic has been largely studied and some object recognition methods present in the state of the art outperform humans in terms of accuracy. Nonetheless, these methods are image-based, namely, they focus in recognizing visual features. This could be a problem in some contexts as there exist objects that look alike some other, different objects. For instance, a social robot that recognizes a face in a picture, or an intelligent car that recognizes a pedestrian in a billboard. A potential solution for this issue would be involving tridimensional data so that the systems would not focus on visual features but topological features. Thus, in this thesis, a study of 3D object recognition methods is carried out. The approaches proposed in this document, which take advantage of deep learning methods, take as an input point clouds and are able to provide the correct category. We evaluated the proposals with a range of public challenges, datasets and real life data with high success. The second part of the thesis is about hand pose estimation. This is also an interesting topic that focuses in providing the hand's kinematics. A range of systems, from human computer interaction and virtual reality to social robots could benefit of such capability. For instance to interface a computer and control it with seamless hand gestures or to interact with a social robot that is able to understand human non-verbal communication methods. Thus, in the present document, hand pose estimation approaches are proposed. It is worth noting that the proposals take as an input color images and are able to provide 2D and 3D hand pose in the image plane and euclidean coordinate frames. Specifically, the hand poses are encoded in a collection of points that represents the joints in a hand, so that they can be easily reconstructed in the full hand pose. The methods are evaluated on custom and public datasets, and integrated with a robotic hand teleoperation application with great success.
APA, Harvard, Vancouver, ISO, and other styles
10

Sambra-Petre, Raluca-Diana. "2D/3D knowledge inference for intelligent access to enriched visual content." Phd thesis, Institut National des Télécommunications, 2013. http://tel.archives-ouvertes.fr/tel-00917972.

Full text
Abstract:
This Ph.D. thesis tackles the issue of sill and video object categorization. The objective is to associate semantic labels to 2D objects present in natural images/videos. The principle of the proposed approach consists of exploiting categorized 3D model repositories in order to identify unknown 2D objects based on 2D/3D matching techniques. We propose here an object recognition framework, designed to work for real time applications. The similarity between classified 3D models and unknown 2D content is evaluated with the help of the 2D/3D description. A voting procedure is further employed in order to determine the most probable categories of the 2D object. A representative viewing angle selection strategy and a new contour based descriptor (so-called AH), are proposed. The experimental evaluation proved that, by employing the intelligent selection of views, the number of projections can be decreased significantly (up to 5 times) while obtaining similar performance. The results have also shown the superiority of AH with respect to other state of the art descriptors. An objective evaluation of the intra and inter class variability of the 3D model repositories involved in this work is also proposed, together with a comparative study of the retained indexing approaches . An interactive, scribble-based segmentation approach is also introduced. The proposed method is specifically designed to overcome compression artefacts such as those introduced by JPEG compression. We finally present an indexing/retrieval/classification Web platform, so-called Diana, which integrates the various methodologies employed in this thesis
APA, Harvard, Vancouver, ISO, and other styles
11

Madi, Kamel. "Inexact graph matching : application to 2D and 3D Pattern Recognition." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1315/document.

Full text
Abstract:
Les Graphes sont des structures mathématiques puissantes constituant un outil de modélisation universel utilisé dans différents domaines de l'informatique, notamment dans le domaine de la reconnaissance de formes. L'appariement de graphes est l'opération principale dans le processus de la reconnaissance de formes à base de graphes. Dans ce contexte, trouver des solutions d'appariement de graphes, garantissant l'optimalité en termes de précision et de temps de calcul est un problème de recherche difficile et d'actualité. Dans cette thèse, nous nous intéressons à la résolution de ce problème dans deux domaines : la reconnaissance de formes 2D et 3D. Premièrement, nous considérons le problème d'appariement de graphes géométriques et ses applications sur la reconnaissance de formes 2D. Dance cette première partie, la reconnaissance des Kites (structures archéologiques) est l'application principale considérée. Nous proposons un "framework" complet basé sur les graphes pour la reconnaissance des Kites dans des images satellites. Dans ce contexte, nous proposons deux contributions. La première est la proposition d'un processus automatique d'extraction et de transformation de Kites a partir d'images réelles en graphes et un processus de génération aléatoire de graphes de Kites synthétiques. En utilisant ces deux processus, nous avons généré un benchmark de graphes de Kites (réels et synthétiques) structuré en 3 niveaux de bruit. La deuxième contribution de cette première partie, est la proposition d'un nouvel algorithme d'appariement pour les graphes géométriques et par conséquent pour les Kites. L'approche proposée combine les invariants de graphes au calcul de l'édition de distance géométrique. Deuxièmement, nous considérons le problème de reconnaissance des formes 3D ou nous nous intéressons à la reconnaissance d'objets déformables représentés par des graphes c.à.d. des tessellations de triangles. Nous proposons une décomposition des tessellations de triangles en un ensemble de sous structures que nous appelons triangle-étoiles. En se basant sur cette décomposition, nous proposons un nouvel algorithme d'appariement de graphes pour mesurer la distance entre les tessellations de triangles. L'algorithme proposé assure un nombre minimum de structures disjointes, offre une meilleure mesure de similarité en couvrant un voisinage plus large et utilise un ensemble de descripteurs qui sont invariants ou au moins tolérants aux déformations les plus courantes. Finalement, nous proposons une approche plus générale de l'appariement de graphes. Cette approche est fondée sur une nouvelle formalisation basée sur le problème de mariage stable. L'approche proposée est optimale en terme de temps d'exécution, c.à.d. la complexité est quadratique O(n2), et flexible en terme d'applicabilité (2D et 3D). Cette approche se base sur une décomposition en sous structures suivie par un appariement de ces structures en utilisant l'algorithme de mariage stable. L'analyse de la complexité des algorithmes proposés et l'ensemble des expérimentations menées sur les bases de graphes des Kites (réelle et synthétique) et d'autres bases de données standards (2D et 3D) attestent l'efficacité, la haute performance et la précision des approches proposées et montrent qu'elles sont extensibles et générales
Graphs are powerful mathematical modeling tools used in various fields of computer science, in particular, in Pattern Recognition. Graph matching is the main operation in Pattern Recognition using graph-based approach. Finding solutions to the problem of graph matching that ensure optimality in terms of accuracy and time complexity is a difficult research challenge and a topical issue. In this thesis, we investigate the resolution of this problem in two fields: 2D and 3D Pattern Recognition. Firstly, we address the problem of geometric graphs matching and its applications on 2D Pattern Recognition. Kite (archaeological structures) recognition in satellite images is the main application considered in this first part. We present a complete graph based framework for Kite recognition on satellite images. We propose mainly two contributions. The first one is an automatic process transforming Kites from real images into graphs and a process of generating randomly synthetic Kite graphs. This allowing to construct a benchmark of Kite graphs (real and synthetic) structured in different level of deformations. The second contribution in this part, is the proposition of a new graph similarity measure adapted to geometric graphs and consequently for Kite graphs. The proposed approach combines graph invariants with a geometric graph edit distance computation. Secondly, we address the problem of deformable 3D objects recognition, represented by graphs, i.e., triangular tessellations. We propose a new decomposition of triangular tessellations into a set of substructures that we call triangle-stars. Based on this new decomposition, we propose a new algorithm of graph matching to measure the distance between triangular tessellations. The proposed algorithm offers a better measure by assuring a minimum number of triangle-stars covering a larger neighbourhood, and uses a set of descriptors which are invariant or at least oblivious under most common deformations. Finally, we propose a more general graph matching approach founded on a new formalization based on the stable marriage problem. The proposed approach is optimal in term of execution time, i.e. the time complexity is quadratic O(n2) and flexible in term of applicability (2D and 3D). The analyze of the time complexity of the proposed algorithms and the extensive experiments conducted on Kite graph data sets (real and synthetic) and standard data sets (2D and 3D) attest the effectiveness, the high performance and accuracy of the proposed approaches and show that the proposed approaches are extensible and quite general
APA, Harvard, Vancouver, ISO, and other styles
12

Wu, Siju. "Study and design of interaction techniques to facilitate object selection and manipulation in virtual environments on mobile devices." Thesis, Université Paris-Saclay (ComUE), 2015. http://www.theses.fr/2015SACLE023/document.

Full text
Abstract:
Les avancées dans le domaine des NUIs (interfaces utilisateur naturelles) permettent aux concepteurs de développer de nouvelles techniques efficaces et faciles à utiliser pour l'interaction 3D. Dans ce contexte, les interfaces mobiles attirent beaucoup d'attention sur la conception de techniques d'interaction 3D pour une utilisation ubiquitaire. Nos travaux de recherche se focalisent sur la proposition de nouvelles techniques d’interaction pour faciliter la sélection et la manipulation d'objets dans des environnements virtuels s’exécutant sur des interfaces mobiles. En effet, l'efficacité et la précision de la sélection des l'objets sont fortement affectés par la taille de la cible et la densité de l’environnement virtuel. Pour surmonter le problème d'occlusion du bout des doigts sur les Smartphones, nous avons conçu deux techniques de sélection reposant sur le toucher. Nous avons également conçu deux techniques hybrides à main levée pour la sélection à distance de petits objets. Pour effectuer une manipulation d’objets contraints sur les Tablet-PC, nous avons proposé une technique bimanuelle basée sur un modèle asymétrique. Les deux mains peuvent être utilisés en collaboration, afin de spécifier la contrainte, déterminer le mode de manipulation et de contrôler la transformation. Nous avons également proposé deux autres techniques de manipulation à une seule main en utilisant les points de contacts identifiés. Les évaluations de nos techniques démontrent qu'ils peuvent améliorer l'expérience des interactions utilisateurs sur des interfaces mobiles. Nos résultats permettent aussi de donner quelques lignes directrices pour améliorer la conception de techniques d'interactions 3D sur des interfaces mobiles
The advances in the field of NUIs (Natural User Interfaces) can provide more and more guidelines for designers to develop efficient and easy-to-use techniques for 3D interaction. In this context, mobile devices attract much attention to design 3D interaction techniques for ubiquitous usage. Our research work focuses on proposing new techniques to facilitate object selection and manipulation in virtual environments on mobile devices. Indeed, the efficiency and accuracy of object selection are highly affected by the target size and the cluster density. To overcome the fingertip occlusion issue on Smartphones, we have designed two touch-based selection techniques. We have also designed two freehand hybrid techniques for selection of small objects displayed at a distance. To perform constrained manipulation on Tablet-PCs, we have proposed a bimanual technique based on the asymmetrical model. Both hands can be used in collaboration, in order to specify the constraint, determine the manipulation mode, and control the transformation. We have also proposed two other single-hand manipulation techniques using identified touch inputs. The evaluations of our techniques demonstrate that they can improve the users’ interaction experience on mobile devices. Our results permit also to give some guidelines to improve the design of 3D interactions techniques on mobile devices
APA, Harvard, Vancouver, ISO, and other styles
13

Sankoh, Hiroshi. "Object Extraction for Virtual-viewpoint Video Synthesis." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/200465.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Qiu, Xuchong. "2D and 3D Geometric Attributes Estimation in Images via deep learning." Thesis, Marne-la-vallée, ENPC, 2021. http://www.theses.fr/2021ENPC0005.

Full text
Abstract:
La perception visuelle d'attributs géométriques (ex. la translation, la rotation, la taille, etc.) est très importante dans les applications robotiques. Elle permet à un système robotique d'acquérir des connaissances sur son environnement et peut fournir des entrées pour des tâches telles que la localisation d'objets, la compréhension de scènes et la planification de trajectoire. Le principal objectif de cette thèse est d'estimer la position et l'orientation d'objets d'intérêt pour des tâches de manipulation robotique. En particulier, nous nous intéressons à la tâche de bas niveau d'estimation de la relation d'occultation, afin de mieux pouvoir discriminer objets différents, et aux tâches de plus haut niveau de suivi visuel d'objets et d'estimation de leur position et orientation. Le premier axe d'étude est le suivi (tracking) d'un objet d'intérêt dans une vidéo, avec des locations et tailles correctes. Tout d'abord, nous étudions attentivement le cadre du suivi d'objet basé sur des filtres de corrélation discriminants et proposons d'exploiter des informations sémantiques à deux niveaux~: l'étape d'encodage des caractéristiques visuelles et l'étape de localisation de la cible. Nos expériences démontrent que l'usage de la sémantique améliore à la fois les performances de la localisation et de l'estimation de taille de l'objet suivi. Nous effectuons également des analyses pour comprendre les cas d'échec. Le second axe d'étude est l'utilisation d'informations sur la forme des objets pour améliorer la performance de l'estimation de la pose 6D des objets et de son raffinement. Nous proposons d'estimer avec un modèle profond les projections 2D de points 3D à la surface de l'objet, afin de pouvoir calculer la pose 6D de l'objet. Nos résultats montrent que la méthode que nous proposons bénéficie du grand nombre de correspondances de points 3D à 2D et permet d'obtenir une meilleure précision des estimations. Dans un deuxième temps, nous étudions les contraintes des méthodes existantes pour raffiner la pose d'objets et développons une méthode de raffinement des objets dans des contextes arbitraires. Nos expériences montrent que nos modèles, entraînés sur des données réelles ou des données synthétiques générées, peuvent raffiner avec succès les estimations de pose pour les objets dans des contextes quelconques. Le troisième axe de recherche est l'étude de l'occultation géométrique dans des images, dans le but de mieux pouvoir distinguer les objets dans la scène. Nous formalisons d'abord la définition de l'occultation géométrique et proposons une méthode pour générer automatiquement des annotations d'occultation de haute qualité. Ensuite, nous proposons une nouvelle formulation de la relation d'occultation (abbnom) et une méthode d'inférence correspondante. Nos expériences sur les jeux de tests pour l'estimation d'occultations montrent la supériorité de notre formulation et de notre méthode. Afin de déterminer des discontinuités de profondeur précises, nous proposons également une méthode de raffinement de cartes de profondeur et une méthode monoculaire d'estimation de la profondeur en une étape. En utilisant l'estimation de relations d'occultation comme guide, ces deux méthodes atteignent les performances de l'état de l'art. Toutes les méthodes que nous proposons s'appuient sur la polyvalence et la puissance de l'apprentissage profond. Cela devrait faciliter leur intégration dans le module de perception visuelle des systèmes robotiques modernes. Outre les avancées méthodologiques mentionnées ci-dessus, nous avons également rendu publiquement disponibles des logiciels (pour l'estimation de l'occlusion et de la pose) et des jeux de données (informations de haute qualité sur les relations d'occultation) afin de contribuer aux outils offerts à la communauté scientifique
The visual perception of 2D and 3D geometric attributes (e.g. translation, rotation, spatial size and etc.) is important in robotic applications. It helps robotic system build knowledge about its surrounding environment and can serve as the input for down-stream tasks such as motion planning and physical intersection with objects.The main goal of this thesis is to automatically detect positions and poses of interested objects for robotic manipulation tasks. In particular, we are interested in the low-level task of estimating occlusion relationship to discriminate different objects and the high-level tasks of object visual tracking and object pose estimation.The first focus is to track the object of interest with correct locations and sizes in a given video. We first study systematically the tracking framework based on discriminative correlation filter (DCF) and propose to leverage semantics information in two tracking stages: the visual feature encoding stage and the target localization stage. Our experiments demonstrate that the involvement of semantics improves the performance of both localization and size estimation in our DCF-based tracking framework. We also make an analysis for failure cases.The second focus is using object shape information to improve the performance of object 6D pose estimation and do object pose refinement. We propose to estimate the 2D projections of object 3D surface points with deep models to recover object 6D poses. Our results show that the proposed method benefits from the large number of 3D-to-2D point correspondences and achieves better performance. As a second part, we study the constraints of existing object pose refinement methods and develop a pose refinement method for objects in the wild. Our experiments demonstrate that our models trained on either real data or generated synthetic data can refine pose estimates for objects in the wild, even though these objects are not seen during training.The third focus is studying geometric occlusion in single images to better discriminate objects in the scene. We first formalize geometric occlusion definition and propose a method to automatically generate high-quality occlusion annotations. Then we propose a new occlusion relationship formulation (i.e. abbnom) and the corresponding inference method. Experiments on occlusion reasoning benchmarks demonstrate the superiority of the proposed formulation and method. To recover accurate depth discontinuities, we also propose a depth map refinement method and a single-stage monocular depth estimation method.All the methods that we propose leverage on the versatility and power of deep learning. This should facilitate their integration in the visual perception module of modern robotic systems.Besides the above methodological advances, we also made available software (for occlusion and pose estimation) and datasets (of high-quality occlusion information) as a contribution to the scientific community
APA, Harvard, Vancouver, ISO, and other styles
15

Sambra-Petre, Raluca-Diana. "2D/3D knowledge inference for intelligent access to enriched visual content." Electronic Thesis or Diss., Evry, Institut national des télécommunications, 2013. http://www.theses.fr/2013TELE0012.

Full text
Abstract:
Cette thèse porte sur la catégorisation d'objets vidéo. L'objectif est d'associer des étiquettes sémantiques à des objets 2D présents dans les images/vidéos. L'approche proposée consiste à exploiter des bases d'objets 3D classifiés afin d'identifier des objets 2D inconnus. Nous proposons un schéma de reconnaissance d'objet, conçu pour fonctionner pour des applications en temps réel. La similitude entre des modèles 3D et des contenus 2D inconnu est évaluée à l'aide de la description 2D/3D. Une procédure de vote est ensuite utilisée afin de déterminer les catégories les plus probables de l'objet 2D. Nous proposons aussi une stratégie pour la sélection des vues les plus représentatives d'un objet 3D et un nouveau descripteur de contour (nommé AH). L'évaluation expérimentale a montré que, en employant la sélection intelligente de vues, le nombre de projections peut être diminué de manière significative (jusqu'à 5 fois) tout en obtenant des performances similaires. Les résultats ont également montré la supériorité de l'AH par rapport aux autres descripteurs adoptés. Une évaluation objective de la variabilité intra et inter classe des bases de données 3D impliqués dans ce travail est également proposé, ainsi qu'une étude comparative des approches d'indexations retenues. Une approche de segmentation interactive est également introduite. La méthode proposée est spécifiquement conçu pour surmonter les artefacts de compression tels que ceux mis en place par la compression JPEG. Enfin, nous présentons une plate-forme Web pour l'indexation/la recherche/la classification, qui intègre les différentes méthodologies utilisées dans cette thèse
This Ph.D. thesis tackles the issue of sill and video object categorization. The objective is to associate semantic labels to 2D objects present in natural images/videos. The principle of the proposed approach consists of exploiting categorized 3D model repositories in order to identify unknown 2D objects based on 2D/3D matching techniques. We propose here an object recognition framework, designed to work for real time applications. The similarity between classified 3D models and unknown 2D content is evaluated with the help of the 2D/3D description. A voting procedure is further employed in order to determine the most probable categories of the 2D object. A representative viewing angle selection strategy and a new contour based descriptor (so-called AH), are proposed. The experimental evaluation proved that, by employing the intelligent selection of views, the number of projections can be decreased significantly (up to 5 times) while obtaining similar performance. The results have also shown the superiority of AH with respect to other state of the art descriptors. An objective evaluation of the intra and inter class variability of the 3D model repositories involved in this work is also proposed, together with a comparative study of the retained indexing approaches . An interactive, scribble-based segmentation approach is also introduced. The proposed method is specifically designed to overcome compression artefacts such as those introduced by JPEG compression. We finally present an indexing/retrieval/classification Web platform, so-called Diana, which integrates the various methodologies employed in this thesis
APA, Harvard, Vancouver, ISO, and other styles
16

Sharma, Naresh. "Arbitrarily Shaped Virtual-Object Based Video Compression." Columbus, Ohio : Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1238165271.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Paternesi, Claudio. "Virtual Reality Labelling Tool for 3D Semantic Segmentation." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text
Abstract:
Durante gli ultimi anni nel campo della Computer Vision si sono susseguiti studi sempre più approfonditi sulla segmentazione semantica 3D, questi lavori richiedono spesso una enorme quantità di modelli 3D su cui fare le elaborazioni. Non sempre però, i dataset disponibili forniscono delle informazioni complete riguardanti anche la segmentazione dei modelli 3D. In questa tesi si propone uno strumento software con cui si possa creare, a partire da un modello 3D, la sua versione segmentata semanticamente, così da poter creare dei dataset completi da usare nelle fasi di training e test di modelli computazionali. Per garantire una buona usabilità e coinvolgere a pieno l’utente, il software è stato sviluppato tramite strumenti di realtà virtuale. Il tool è stato infine validato tramite dei test eseguiti su dei dataset già esistenti con l’obiettivo di valutare l’efficienza e l’accuratezza del software stesso.
APA, Harvard, Vancouver, ISO, and other styles
18

Lengyel, Kristián. "Zobrazování medicínských dat v reálném čase." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-235544.

Full text
Abstract:
This thesis deals with design and implementation of an application for medical data imaging in real-time. The first part of project is focused on methods for obtaining data in medical practice and visualization of large volume data on computer using familiar rendering approaches. Similar applications are used outside of medicine in other fields, such as chemistry to display molecular structures or microorganisms. Another part of project will focus on benefits of visualization of volumetric data using programmable hardware and new methods of parallelization of algorithms on graphics card using CUDA technology, and OpenCL. The resulting application will display the volume of medical data based on selected method accelerated by programmable shaders, and time-consuming operations will be paralleled on graphics card.
APA, Harvard, Vancouver, ISO, and other styles
19

Batmaz, Anil Ufuk. "Speed, precision and grip force analysis of human manual operations with and without direct visual input." Thesis, Strasbourg, 2018. http://www.theses.fr/2018STRAJ056/document.

Full text
Abstract:
Le système perceptif d’un chirurgien doit s’adapter aux contraintes multisensorielles liées à la chirurgie guidée par l’image. Trois expériences sont conçues pour explorer ces contraintes visuelles et haptiques pour l’apprentissage guidé par l’image. Les résultats montrent que les sujets sont plus rapides et plus précis avec une vision directe. La stéréoscopie 3D n’améliore pas les performances des débutants complets. En réalité virtuelle, la variation de la longueur, largeur, position et complexité de l'objet affecte les performances motrices. La force de préhension appliquée sur un système robotique chirurgical dépend de l'expérience de l'utilisateur. En conclusion, le temps et la précision sont importants, mais la précision doit rester une priorité pour un apprenti. L'homogénéité des groupes d'étude est important pour la recherche sur la formation chirurgicale. Les résultats ont un impact direct sur le suivi des compétences individuelles pour les applications guidées par l'image
Perceptual system of a surgeon must adapt to conditions of multisensorial constrains regard to planning, control, and execution of the image-guided surgical operations. Three experimental setups are designed to explore these visual and haptic constraints in the image-guided training. Results show that subjects are faster and more precise with direct vision compared to image guidance. Stereoscopic 3D viewing does not represent a performance advantage for complete beginners. In virtual reality, variation in object length, width, position, and complexity affect the motor performance. Applied grip force on a surgical robot system depends on the user experience level. In conclusion, both time and precision matter critically, but trainee gets as precise as possible before getting faster should be a priority. Study group homogeneity and background play key role in surgical training research. The findings have direct implications for individual skill monitoring for image-guided applications
APA, Harvard, Vancouver, ISO, and other styles
20

"3D object reconstruction from 2D and 3D line drawings." 2008. http://library.cuhk.edu.hk/record=b5893538.

Full text
Abstract:
Chen, Yu.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2008.
Includes bibliographical references (leaves 78-85).
Abstracts in English and Chinese.
Chapter 1 --- Introduction and Related Work --- p.1
Chapter 1.1 --- Reconstruction from 2D Line Drawings and the Applications --- p.2
Chapter 1.2 --- Previous Work on 3D Reconstruction from Single 2D Line Drawings --- p.4
Chapter 1.3 --- Other Related Work on Interpretation of 2D Line Drawings --- p.5
Chapter 1.3.1 --- Line Labeling and Superstrictness Problem --- p.6
Chapter 1.3.2 --- CAD Reconstruction --- p.6
Chapter 1.3.3 --- Modeling from Images --- p.6
Chapter 1.3.4 --- Identifying Faces in the Line Drawings --- p.7
Chapter 1.4 --- 3D Modeling Systems --- p.8
Chapter 1.5 --- Research Problems and Our Contributions --- p.10
Chapter 1.5.1 --- Recovering Complex Manifold Objects from Line Drawings --- p.10
Chapter 1.5.2 --- The Vision-based Sketching System --- p.11
Chapter 2 --- Reconstruction from Complex Line Drawings --- p.13
Chapter 2.1 --- Introduction --- p.13
Chapter 2.2 --- Assumptions and Terminology --- p.15
Chapter 2.3 --- Separation of a Line Drawing --- p.17
Chapter 2.3.1 --- Classification of Internal Faces --- p.18
Chapter 2.3.2 --- Separating a Line Drawing along Internal Faces of Type 1 --- p.19
Chapter 2.3.3 --- Detecting Internal Faces of Type 2 --- p.20
Chapter 2.3.4 --- Separating a Line Drawing along Internal Faces of Type 2 --- p.28
Chapter 2.4 --- 3D Reconstruction --- p.44
Chapter 2.4.1 --- 3D Reconstruction from a Line Drawing --- p.44
Chapter 2.4.2 --- Merging 3D Manifolds --- p.45
Chapter 2.4.3 --- The Complete 3D Reconstruction Algorithm --- p.47
Chapter 2.5 --- Experimental Results --- p.47
Chapter 2.6 --- Summary --- p.52
Chapter 3 --- A Vision-Based Sketching System for 3D Object Design --- p.54
Chapter 3.1 --- Introduction --- p.54
Chapter 3.2 --- The Sketching System --- p.55
Chapter 3.3 --- 3D Geometry of the System --- p.56
Chapter 3.3.1 --- Locating the Wand --- p.57
Chapter 3.3.2 --- Calibration --- p.59
Chapter 3.3.3 --- Working Space --- p.60
Chapter 3.4 --- Wireframe Input and Object Editing --- p.62
Chapter 3.5 --- Surface Generation --- p.63
Chapter 3.5.1 --- Face Identification --- p.64
Chapter 3.5.2 --- Planar Surface Generation --- p.65
Chapter 3.5.3 --- Smooth Curved Surface Generation --- p.67
Chapter 3.6 --- Experiments --- p.70
Chapter 3.7 --- Summary --- p.72
Chapter 4 --- Conclusion and Future Work --- p.74
Chapter 4.1 --- Conclusion --- p.74
Chapter 4.2 --- Future Work --- p.75
Chapter 4.2.1 --- Learning-Based Line Drawing Reconstruction --- p.75
Chapter 4.2.2 --- New Query Interface for 3D Object Retrieval --- p.75
Chapter 4.2.3 --- Curved Object Reconstruction --- p.76
Chapter 4.2.4 --- Improving the 3D Sketch System --- p.77
Chapter 4.2.5 --- Other Directions --- p.77
Bibliography --- p.78
APA, Harvard, Vancouver, ISO, and other styles
21

Richards, Whitman, Jan J. Koenderink, and D. D. Hoffman. "Inferring 3D Shapes from 2D Codons." 1985. http://hdl.handle.net/1721.1/5613.

Full text
Abstract:
All plane curves can be described at an abstract level by a sequence of five primitive elemental shapes, called "condons", which capture the sequential relations between the singular points of curvature. The condon description provides a basis for enumerating all smooth 2D curves. Let each of these smooth plane be considered as the si lhouette of an opaque 3D object. Clearly an in finity of 3D objects can generate any one of ou r "condon" silhouettes. How then can we p redict which 3D object corresponds to a g iven 2D silhouette? To restrict the infinity of choices, we impose three mathematical properties of smooth surfaces plus one simple viewing constraint. The constraint is an extension of the notion of general position, and seems to drive our preferred inferences of 3D shapes, given only the 2D contour.
APA, Harvard, Vancouver, ISO, and other styles
22

Grimson, W. Eric, Daniel P. Huttenlocher, and T. D. Alter. "Recognizing 3D Ojbects of 2D Images: An Error Analysis." 1992. http://hdl.handle.net/1721.1/5959.

Full text
Abstract:
Many object recognition systems use a small number of pairings of data and model features to compute the 3D transformation from a model coordinate frame into the sensor coordinate system. With perfect image data, these systems work well. With uncertain image data, however, their performance is less clear. We examine the effects of 2D sensor uncertainty on the computation of 3D model transformations. We use this analysis to bound the uncertainty in the transformation parameters, and the uncertainty associated with transforming other model features into the image. We also examine the impact of the such transformation uncertainty on recognition methods.
APA, Harvard, Vancouver, ISO, and other styles
23

Mou, Chia-Chang, and 牟家昌. "Object Recognition Using 2D Image and 3D Point Clouds Data." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/35279695086781961762.

Full text
Abstract:
碩士
國立交通大學
電控工程研究所
99
In recent years, research works of three dimensional object recognition in point cloud data become more and more popular. Appearance-based features, such as silhouettes of objects, will directly affect the recognition efficiency in different positions with various angles. To tackle this problem, this thesis proposes a recognition system with two-feature integration. One is the Fourier descriptor of the contour in a range image, and the other is the structure descriptor extracted from point clouds. The Fourier descriptor is used to identify an object in the far distance. Additionally, a method of view-angle interpolation is proposed to increase the correct recognition rate. The structure descriptor is used to recognize an object when closing to the object, since the contour information lacks the ability to describe the object. Furthermore, a strategy of proposed method is presented to select the appropriate feature for object recognition. Ten different control towers are used to verify the performance of the proposed approach. The experimental results show that the proposed system performs better than the method using only feature of range image or feature of point clouds data across the entire distance range.
APA, Harvard, Vancouver, ISO, and other styles
24

Xian, Xiaohua. "2D & 3D UML-based software visualization for object-oriented programs." Thesis, 2003. http://spectrum.library.concordia.ca/2345/1/MQ83923.pdf.

Full text
Abstract:
UML (Unified Modeling Language) is a successful example of two-dimensional software visualization that is widely used in both academic and enterprise environments for object-oriented software development. The presented work ( UML3D ), which is included in the CONCEPT (Comprehension Of Net-CEntered Programs and Techniques) framework, applies 3D visualization techniques to UML to take advantages of 3D space and the additional features that can be applied in the 3D space. The UML3D project also integrates a self-organizing layout algorithm for both traditional 2D UML and 3D UML diagrams. The use of layout algorithms can reduce the complexity of a graph and facilitate the task of program comprehension. Moreover, UML3D addresses some other shortcomings of UML by providing intuitive navigation and interactions with the diagrams. We also discuss the use of source code analysis like program slicing and coupling to improve the scalability, usability and navigability of the visual representations. An initial usability study of UML3D based on the SUMI (Software Usability Measurement Inventory) questionnaire was performed to study the ease of use and to identify future research directions.
APA, Harvard, Vancouver, ISO, and other styles
25

Chen, Yi-Chun, and 陳奕均. "An Efficient 2D to 3D Image Conversion with Object-based Segmentation." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/85772474159077980474.

Full text
Abstract:
碩士
國立交通大學
電子研究所
99
Nowadays, the 3D image processing has become a trend in the related visual processing field. Many automatic 2D to 3D conversion algorithms have been proposed to solve the lack of 3D content. But there is still no fast algorithm that converts single monocular images well. In this thesis, we propose a fast conversion algorithm that includes the image segmentation, image classification, object boundary tracing method, and 3D image generation. The image segmentation adopts the watershed method to easily collect the information of depth cue. Then, the image classification recovers the geometry of scene in the image. With the depth cue and geometry information, the object boundary tracing method is proposed to detect objects in image efficiently. Finally, the object result is used to generate depth map and 3D anaglyph image. To evaluate the results, we compare the stereo images with other 2D to 3D conversion systems. Experiment result shows that the proposed 2D to 3D conversion algorithm could perform better than the associated ones in the depth accuracy and processing speed for converting monocular images.
APA, Harvard, Vancouver, ISO, and other styles
26

Guo, Jiang-Yu, and 郭江禹. "Reconstruction of a 3D Object Model Using 2D Image Contours Data." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/r465mt.

Full text
Abstract:
碩士
國立臺灣師範大學
機電科技研究所
97
This paper proposes reconstruction of a 3D object model using 2D image contours data, with building the two-dimensional image contours, so that objects can show three-dimensional model, and applied to medical physical therapy, for example: magnetic resonance imaging systems, nuclear medicine system, and so on .... And then combine with a robot which can achieves an automatic system. Generated three-dimensional model approach, is currently the most commonly seen are the following: first, the most direct way is to use three-dimensional model of graphics software (such as: 3D Maxs) to produce three-dimensional model; Secondly, the use of three-dimensional measurement scanning system to scan objects directly through the three-dimensional information in order to establish a three-dimensional computer model; Third, the use of camera, obtained by two-dimensional digital imaging portfolio, or digital image processing, with Some algorithms, and the establishment of a three-dimensional model. In this study, which adopts the third approach, to create three-dimensional model. First of all, by the principle of the concept of projection to the two CCD cameras, the engineering graphics simulation of the four quadrants in the first quadrant of the vertical and horizontal projection surface type, and this two-sided projection, through the papers algorithm, the two-dimensional coordinates of the image sequence extracted value. Through the OpenGL function, will link three-dimensional coordinates of points, you can get a rough three-dimensional model, then the layout of the grid, and for shading and lighting technology, three-dimensional model can be generated. From the experimental results, this research establishes an automatic 3D object model reconstructed system. The system builds a 3D object model in computer by CCD camera and multiple image processes technology. The model will provide to robot using for the next stage.
APA, Harvard, Vancouver, ISO, and other styles
27

Payet, Nadia. "From shape-based object recognition and discovery to 3D scene interpretation." Thesis, 2011. http://hdl.handle.net/1957/21316.

Full text
Abstract:
This dissertation addresses a number of inter-related and fundamental problems in computer vision. Specifically, we address object discovery, recognition, segmentation, and 3D pose estimation in images, as well as 3D scene reconstruction and scene interpretation. The key ideas behind our approaches include using shape as a basic object feature, and using structured prediction modeling paradigms for representing objects and scenes. In this work, we make a number of new contributions both in computer vision and machine learning. We address the vision problems of shape matching, shape-based mining of objects in arbitrary image collections, context-aware object recognition, monocular estimation of 3D object poses, and monocular 3D scene reconstruction using shape from texture. Our work on shape-based object discovery is the first to show that meaningful objects can be extracted from a collection of arbitrary images, without any human supervision, by shape matching. We also show that a spatial repetition of objects in images (e.g., windows on a building facade, or cars lined up along a street) can be used for 3D scene reconstruction from a single image. The aforementioned topics have never been addressed in the literature. The dissertation also presents new algorithms and object representations for the aforementioned vision problems. We fuse two traditionally different modeling paradigms Conditional Random Fields (CRF) and Random Forests (RF) into a unified framework, referred to as (RF)^2. We also derive theoretical error bounds of estimating distribution ratios by a two-class RF, which is then used to derive the theoretical performance bounds of a two-class (RF)^2. Thorough experimental evaluation of individual aspects of all our approaches is presented. In general, the experiments demonstrate that we outperform the state of the art on the benchmark datasets, without increasing complexity and supervision in training.
Graduation date: 2011
Access restricted to the OSU Community at author's request from May 12, 2011 - May 12, 2012
APA, Harvard, Vancouver, ISO, and other styles
28

Castelhano, João Miguel Seabra. "Neural substrates of 2D/3D object perception: a combined EEG/fMRI approach." Doctoral thesis, 2015. http://hdl.handle.net/10316/26307.

Full text
Abstract:
Tese de doutoramento em Ciências da Saúde, no ramo de Ciências Biomédicas, apresentada à Faculdade de Medicina da Universidade de Coimbra
Perceptual decision making is defined as the choice of possible interpretations of the world based on the incoming sensory evidence. The role of temporal coding in this process and coherent perception, defined as hierarchical grouping of local elements, remains controversial. Oscillatory processes in the gamma frequency range (>30 Hz) have been proposed to play a role in signaling emerging object percepts in the brain. Studies using Electroencephalography and Magnetoencephalography (EEG and MEG) have suggested that gamma-band oscillations are related to the integration of information and the ability to form coherent gestalts as well as attention and working memory processes. It is accepted that gamma-band synchrony reflects binding of information across different brain regions leading to the emergence of a coherent percept. There are also reports that correlate gamma activity with many other cognitive processes. Hence, a wide variety of gamma-band patterns and sources were reported for different tasks. In this line, both animal and human studies have suggested that understanding oscillatory activity patterning can be important to understand normal and abnormal cognitive function. However, it remains unclear whether distinct patterns across the gamma frequency range related to different cognitive modules do coexist in the same task. We investigated visual perceptual recognition moments based on EEG analysis with ambiguous Mooney stimuli (black and white incomplete pictures). We departed from classical paradigms which are based on contrasts between stimuli conditions that are fixed in time, and adopted a paradigm whereby the moment of perception of an emergent global pattern was variable. Therefore we could directly compare perception vs. no perception states for the same stimuli and separate sensory and motor processing components. We found a direct link between gamma-band temporal patterns (in two distinct sub-bands: ~40 Hz and ~60 Hz) and the presence versus absence of emerging holistic perception of variable onset. These findings were confirmed in a data driven manner with a support vector machine classification approach based on time-frequency features. Unimodal studies do not have enough resolution to test for non-unitary sources of these sub-bands and to establish their spatial distribution. Using a simultaneous Electroencephalography and functional Magnetic Resonance Imaging (EEG/fMRI) approach we provided new evidence for separable gamma activity patterns reflecting holistic perception. We found that distinct gamma frequency sub-bands reflect different neural substrates and cognitive mechanisms when comparing object perception states vs. no categorical perception. Accordingly, at least two separate neural modules are involved in holistic perceptual decision, one in the visual cortex (~60 Hz) and the other in the anterior insula (~40 Hz). These findings showed that current neuronal models of gamma-band spatial distribution need to consider the duality by separating low and high sub-bands. This provides a step forward in understanding the functional specialization of decision-making networks and the role of gamma frequency range sub-bands in signaling their different neural and cognitive components. This may shed new light on the role of gamma-band response in normal cognition and in neuropsychiatric disorders such as autism and schizophrenia, where both visual and decision making circuits may be impaired. Importantly, it remains unclear whether oscillation amplitude is relevant for encoding global stimulus properties or, alternatively, it is neural synchrony that plays a pivotal role in gestalt formation. In this study, we addressed this question by studying Williams Syndrome (WS), a well characterized model of impaired central coherence, using EEG and a set of experimental tasks requiring visual integration. It has been hypothesized that neural synchrony underlies central coherence that is a well-known model for cognitive dysfunction in autistic spectrum disorders. WS patients show markedly disrupted visual perceptual coherence and holistic integration. Using this human model of loss of coherence, we showed for the first time that neuronal synchrony is reduced across stimulus conditions and this is associated with increased amplitude modulation at 25-45 Hz. This combination of a dramatic loss of synchrony despite increased oscillatory activity represents strong evidence that synchrony underlies central coherence. To directly identify the sources of those specific sub-bands within gamma range and clarify their roles, we used Electrocorticography (ECoG) with the added value of greater spatial and temporal resolution. We used the unique opportunity provided by functional mapping in epilepsy and tested an epileptic patient. Interestingly, we identified a stimulus dependent graded posteroanterior sharpening of frequency responses. Lower frequencies dominated in the anterior ventro-temporal areas and higher frequency modulations in occipital regions. In summary, this set of works addressed several critical points to understand the role of oscillatory activity in perceptual decision mechanisms. We conclude that separable gamma sub-bands reflect different cognitive mechanisms. A distinct spatial source map is present for different gamma sub-bands activity during visual holistic perception. Low gamma (40 Hz) activity is related to the decision making network and High gamma (60 Hz) is localized to early visual processing regions. Moreover, we showed that synchrony underlies central coherence. These demonstrations of a clear functional topography for distinct gamma sub-bands within the same task shows that distinct gamma-band modulations (amplitude and synchrony) underlie sensory processing and perceptual decision mechanisms. These results have potential implications for the development of new diagnostic biomarkers and therapeutic targets.
A decisão perceptual representa o processo de escolha de possíveis interpretações do mundo com base na evidência sensorial externa. O papel dos ritmos cerebrais neste processo e na emergência da percepção holística de objetos, a partir do processamento hierárquico de elementos locais, permanece controverso. No entanto, tem sido proposto que as oscilações num intervalo de frequências conhecido como a banda gama (> 30 Hz), estejam envolvidas neste processamento, com relevância particular na identificação de objetos a partir de estímulos ambíguos. Vários estudos de EEG e MEG (Eletroencefalografia e Magnetoencefalografia) sugeriram que as oscilações nesta banda de frequências estão relacionadas com a integração de informação proveniente de diferentes áreas cerebrais e a capacidade de tomar decisões perceptuais. Outros processos cognitivos, como a atenção ou a capacidade de memória de trabalho, também parecem ter por base mecanismos análogos. Neste sentido, compreender os mecanismos de emergência de oscilações em relação com processos cognitivos bem como as suas bases neurais é importante para compreender a função cognitiva normal e/ou em doenças neuropsiquiátricas. Apesar do crescente interesse nesta área de estudo, ainda não é claro se a multiplicidade de padrões encontrados está relacionada com diferentes módulos cognitivos que coexistem na mesma tarefa. Neste estudo, utilizámos EEG para estudar os momentos de decisão perceptual em tarefas visuais com estímulos ambíguos (estímulos Mooney, imagens compostas de fragmentos negros e brancos sem interpretação perceptual imediata). Dado que o momento da percepção do objecto era variável foi assim possível separar os componentes sensoriais e motores daqueles relacionados com a decisão perceptual. Assim, construímos novos paradigmas para comparar directamente estados de percepção vs. não percepção do mesmo estímulo físico. Foi possível identificar actividade em duas sub-bandas distintas (40 Hz e 60 Hz) com importância para a percepção holística. Estes resultados foram confirmados com classificadores automáticos usando como entradas as características do sinal obtidas no domínio das frequências. Para podermos identificar as fontes (em termos da distribuição espacial no cérebro) destas sub-bandas, recorremos a uma técnica multimodal com melhor resolução espácio-temporal que o EEG. Usando electroencefalografia e Imagem por Ressonância Magnética funcional em simultâneo (EEG /fMRI), descobrimos que aquelas sub-bandas da banda gama reflectem diferentes substratos neuronais e mecanismos cognitivos. Neste sentido, pelo menos dois módulos estão envolvidos na rede da percepção holística. Um sediado no córtex visual (60 Hz) e outro na ínsula anterior (40 Hz). Estes resultados permitem compreender melhor a especialização das redes de tomada de decisão e mostram que os actuais modelos neuronais da localização espacial da banda gama devem considerar a sua dualidade, separando-a em diferentes sub-bandas, com diferentes funções. Estes dados podem trazer novas perspectivas sobre o papel funcional das diferentes sub-bandas na cognição normal, assim como em doenças como o autismo ou a esquizofrenia, onde vários circuitos (quer visuais, quer de decisão) parecem estar afectados. Uma questão de interesse científico considerável, é se é a amplitude das oscilações o factor relevante para a codificação dos estímulos como um todo (percepção holística) ou, por outro lado, se é a sincronização entre áreas cerebrais que desempenha o papel chave. Para responder a esta questão estudámos uma população com síndrome de Williams (WS). Esta condição é caracterizada por dificuldades na integração visual e processamento holístico (“como se não vissem a floresta, mas apenas as árvores”). Como a sincronia está relacionada com a coerência central, esta deveria estar afectada neste modelo de disrupção da percepção holística. Pela primeira vez, mostrámos que a sincronização neuronal está reduzida neste grupo ao mesmo tempo que há um aumento da amplitude das oscilações na mesma banda de frequências (25-45 Hz). Esta combinação de uma dramática perda de sincronia mesmo na presença de um aumento concomitante da amplitude representa uma forte evidência de que a sincronia está subjacente à coerência central. Com o objectivo de melhorar a resolução espacial e identificar directamente as fontes destas oscilações, aproveitámos a oportunidade única proporcionada pelo mapeamento funcional de um doente com epilepsia usando electrocorticografia (ECoG). Neste caso, foi possível identificar um padrão posterior-anterior de actividade que é consistente com a noção de que existem bandas representativas de diferentes processos cognitivos. As frequências mais baixas dominam nas áreas anteriores ventro-temporais (<100 Hz) e modulações de frequência numa banda mais alta dominam nas regiões occipitais. Em suma, estes estudos permitiram contribuir para esclarecimento do papel das oscilações nos mecanismos de decisão perceptual. Conclui-se assim que é possível separar diferentes sub-bandas dentro da banda gama e que estas refletem diferentes mecanismos cognitivos. Estas têm uma função específica na decisão perceptual e uma origem distinta no córtex. A actividade na banda mais baixa (40 Hz), está relacionada com a rede de tomada de decisão. Por outro lado, a banda dos 60 Hz está essencialmente localizada em regiões de processamento visual primário. A demonstração de uma topografia funcional específica para sub-bandas específicas, dentro da mesma tarefa, mostra que diferentes modulações das mesmas (amplitude e sincronia) estão na base do processamento sensorial e dos mecanismos de decisão perceptual. Estes resultados, em conjunto com o estudo dos mecanismos moleculares que dão origem às oscilações, têm implicações para a compreensão dos fenómenos perceptuais na saúde e na doença, bem como no possível desenvolvimento de novos biomarcadores de diagnóstico e alvos terapêuticos.
FCT - SFRH/BD/65341/2009
APA, Harvard, Vancouver, ISO, and other styles
29

Chen, Yu-Ru, and 陳妤如. "Robot Arm Autonomous Object Grasping System Based on 2D and 3D Vision Techniques." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/84u98v.

Full text
Abstract:
碩士
國立臺灣科技大學
機械工程系
107
This research develops a 2D and 3D integrated vision system that can command the robotic arm to fully autonomous object grasping, combines the six-axis robot arm for object grasping. The techniques used include 2D object recognition using deep learning, Point Pair Features (PPF), Image Based Visual Servoing (IBVS), and Perspective-n-Point (PnP). For the variety of objects and scenes in the family, 3D object pose estimation is the core of the system. PPF is a very effective object 6D pose estimation technology, but the implementation of PPF requires constant sampling, which leads to a huge amount of calculation, and the matching result may be wrong. Therefore, this study uses deep learning object recognition technology to identify objects and finding the 2D pixel position of the object using RGB information. After that, the 2D position is converted into a 3D coordinate in the RGB-D camera, and then only save the point cloud of the approximate position area of the object for matching. It can remove the unnecessary point cloud and save a large number of sampling processes, also save a lot of matching time and increase the recall rate of PPF matching. After the matching is completed, the robot arm can be guided to the position of the template set by the matching object. In order to overcome various errors, the system uses the artificial mark on the object to perform IBVS or PnP. It can move the robot arm to a more accurate object grasping position. Relative to the IBVS that can perform grasping of moving objects, but with slow convergence, PnP technology can quickly move the robot arm to the precise grasping position of the stationary object. This study also included the implementation of several object grasping experiments to confirm the practicality and time effectiveness of this development system.
APA, Harvard, Vancouver, ISO, and other styles
30

Ghobadi, Seyed Eghbal [Verfasser]. "Real time object recognition and tracking using 2D/3D images / von Seyed Eghbal Ghobadi." 2010. http://d-nb.info/1009885472/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Hegazy, Doaa Abd al-Kareem Mohammed [Verfasser]. "Boosting for generic 2D/3D object recognition / von Doaa Abd Al-Kareem Mohammed Hegazy." 2010. http://d-nb.info/1001518209/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Jiang, Ci-syu, and 江麒旭. "A correction method of the multiple-object 3D model reconstruction based on 2D images." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/93624808001814518129.

Full text
Abstract:
碩士
國立臺灣科技大學
機械工程系
97
A 3D digitizer can be used to perform reverse engineering. It takes photos of a working piece layer by layer. After the images are segmented, we can use these images to reconstruct a CAD model by computer software. Although threshold method was useful to segment single object images, multi-threshold method did not work on multiple-object images. After multiple-object images were segmented by multi-threshold method, these images had 3 kinds of image errors. The first one was material transition, the second one was image profile’s boundary and the third one was tooling marks. Both material transition and image profile’s boundary were caused by the pixel’s gray level which was too similar to our target. Tooling marks were caused randomly by cutting tool. In this study, we used linear regression method to correct image error. In our cases, the material transition was between 4% and 10 %. Then we used image morphology to correct images profile’s boundary and tooling marks. By the image processes from this study, we can now correct images from 3D digitizers, and thus we can offer a accurate images for our applications.
APA, Harvard, Vancouver, ISO, and other styles
33

Lin, Chin-Hsin, and 林進星. "Converting 2D Video Sequences Using Object Tracking and Depth-Maps for 3D Stereoscopic Display." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/62275354545388440493.

Full text
Abstract:
碩士
中原大學
資訊工程研究所
95
A computer framework for the conversion of 2D video sequence to 3D for stereoscopic display based on inside of image frame of vanishing lines and vanishing point is presented. Given a 2D video sequence in a single-view scene, the main processes were to automatically segment and track moving objects, and to generate a depth-map with respect to a vanishing point for the scene. Depths of the motion path for the moving object could be estimated accordingly. As a result, binocular-view images were generated and recombined for stereoscopic display. The experimental results are promising, because of virtual 3D experience due to moving objects that draw attention of viewers. In addition, special 3D effects could be created with the moving objects that were further superimposed onto various backgrounds. In conclusion, our computer framework provides a systematic way of creating 3D video sequence for stereoscopic display, especially for 3D experience of moving objects in various single-view scenes.
APA, Harvard, Vancouver, ISO, and other styles
34

Huang, Ying-Yuan, and 黃盈源. "3D Object Model Recovery from 2D Images Utilizing Corner Detection and Virtual Mesh Grid." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/08443820117389508599.

Full text
Abstract:
碩士
國立臺灣師範大學
機電科技研究所
99
This research proposes a new method to reconstruct the 3D object model from 2D images. One type of the non-contact scanning measurement for the stereo vision algorithm is used in this research. The stereo vision simulates human’s eyes to capture the depth information of the object. Therefore, this research uses two CCD Cameras to capture two images of the object. Then, find out the match points from the two images. Using the match points and combine 1)the parameters of the two CCD Cameras and 2)transform matrix between the world coordinate and camera coordinate to get the depth information of each point in the space. Finally, the object’s 3D model can be reconstructed. The important issue of the stereo vision theorem is how to find out the match points from the two images accurate. For solving this issue in the past researches, many articles used a projected structure light on the object’s surfaces to measure the match points. In this research, the proposed system is able to find out the match points from the two images by the structure light. But this method will be restricted by the color of the object surface. This research proposes a method to reconstruct the 3D model without projecting the structure light. The system uses corner detection and virtual mesh grid to reconstruct the simple geometry and curved the surface of object. The feature points of the simple geometry object are usually on the corner of the contour. So we can find out the feature points by doing the corner detection, and then the system would calculate the depth of the feature points to project the feature points in the 3D coordinated space. And then, the simple geometry object’s 3D model would be reconstructed from these feature points. But the curved surface object doesn’t have the visible feature points, therefore, this paper build up the virtual mesh grid from the left image. Then, the system would estimate the match points by the epipolar geometry theorem and builds up the virtual mesh grid on the right image. Finally system reconstructs the 3D model by the stereo vision theorem and virtual mesh grid of the two images successfully.
APA, Harvard, Vancouver, ISO, and other styles
35

Kuan-JuLu and 呂冠儒. "3D Object Point Estimation by 2D Image Using Multi-Stage Deep Convolutional Neural Networks." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/f2pmma.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Borkowski, Maciej. "2D to 3D conversion with direct geometrical search and approximation spaces." 2007. http://hdl.handle.net/1993/2827.

Full text
Abstract:
This dissertation describes the design and implementation of a system that has been designed to extract 3D information from pairs of 2D images. System input consists of two images taken by an ordinary digital camera. System output is a full 3D model extracted from 2D images. There are no assumptions about the positions of the cameras during the time when the images are being taken, but the scene must not undergo any modifications. The process of extracting 3D information from 2D images consists of three basic steps. First, point matching is performed. The main contribution of this step is the introduction of an approach to matching image segments in the context of an approximation space. The second step copes with the problem of estimating external camera parameters. The proposed solution to this problem uses 3D geometry rather than the fundamental matrix widely used in 2D to 3D conversion. In the proposed approach (DirectGS), the distances between reprojected rays for all image points are minimised. The contribution of the approach considered in this step is a definition of an optimal search space for solving the 2D to 3D conversion problem and introduction of an efficient algorithm that minimises reprojection error. In the third step, the problem of dense matching is considered. The contribution of this step is the introduction of a proposed approach to dense matching of 3D object structures that utilises the presence of points on lines in 3D space. The theory and experiments developed for this dissertation demonstrate the usefulness of the proposed system in the process of digitizing 3D information. The main advantage of the proposed approach is its low cost, simplicity in use for an untrained user and the high precision of reconstructed objects.
October 2007
APA, Harvard, Vancouver, ISO, and other styles
37

Su, Tzung-Min, and 蘇宗敏. "Robust 3D Object Recognition using 2D Views via an Incremental Similarity-Based Aspect-Graph Approach." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/39874419206497670787.

Full text
Abstract:
博士
國立交通大學
電機與控制工程系所
96
This work presents a framework for robust recognizing 3D objects from 2D views. The proposed framework comprises of two stages: the pre-processing stage and the incremental database construction stage. In the pre-processing stage, foreground objects is extracted from 2D views and applied for building 3D database and recognizing. In the incremental database construction stage, a 3D object database is built and updated using 2D views randomly sampled from a viewing sphere. A background subtraction scheme involving highlight and shadow removal (BSHSR) is proposed as the pre-processing stage of the framework. Foreground regions can be precisely extracted from 2D views using the BSHSR despite illumination variations and dynamic background. The BSHSR comprises three models, called the color-based probabilistic background model (CBM), the gradient-based version of the color-based probabilistic background model (GBM) and a cone-shape illumination model (CSIM). The Gaussian mixture model (GMM) is applied to construct the CBM using pixel statistics. Based on the CBM, the short-term color-based background model (STCBM) and the long-term color-based background model (LTCBM) can be extracted and applied to build the GBM. Furthermore, a new dynamic cone-shape boundary in the RGB color space, called the CSIM, is proposed to distinguish pixels among shadow, highlight and foreground. An incremental database construction method based on similarity-based aspect-graph (ISAG) is proposed for building the 3D object database using 2D views. Similarity-based aspect-graph, which contains a set of aspects and characteristic views for these aspects, is employed to represent the database of 3D objects. An incremental database construction method that maximizes the similarity of views in the same aspect and minimizes the similarity of prototypes is proposed as the core of the framework. To imitate the ability of human cognition, 2D views randomly sampled from a viewing sphere are applied for building and updating a 3D object database. The effectiveness of the BSHSR is demonstrated via experiments with several video clips collected in a complex indoor environment. The BSHSR is applied in the proposed framework to extract foreground object from 2D views. The proposed framework is evaluated on various 3D object recognition problems, including 3D rigid recognition, human posture recognition, and scene recognition. Shape and color features are employed in different applications with the proposed framework to show the efficiency of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
38

Chan, Ya-Ping, and 詹雅評. "3D Video Conversion from 2D Video with Both Camera and Object Motion Using Bundle Adjustment." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/47194804895905129787.

Full text
Abstract:
碩士
國立臺灣大學
資訊工程學研究所
99
This thesis presents a system - convert 2-D video to 3-D video via the method for reconstructing good high-quality video dis- parity maps. First, find out the background disparity maps of videos which contain both camera motion and object motion in videos. we formulate it to a energy minimization problem by using color constraint and geometric constraint to recover disparity maps. The goal is to estimate background disparity, so we discard any information of moving objects (foreground), such as color and segmentation, and the disparity value of pixels occluded by foreground is affected by the disparity of its neighbors. Given background disparity maps, we recover background images. With background images and disparity maps, we synthesize left-eye view and right- eye view video pair by using depth image-based rendering (DIBR) method.
APA, Harvard, Vancouver, ISO, and other styles
39

HSIEH, ZONG-YOU, and 謝宗佑. "3D Object Recognition System of Indoor Scene Based on Point Cloud And 2D SURF Feature Points." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/mk75r4.

Full text
Abstract:
碩士
國立臺北科技大學
資訊工程系
107
In our previous work had proposed a system for 3D point cloud object recognition. Providing intelligent robots a deeper understanding ability on object recognition. However, the process of recognition in this system need to compare with object in database. Due to here are many 3D objects in the world, as the number of recognition 3D object increases, the size of object database increases, when the object of pending recognition need to be matched with the database object, if there have no similar index search, it will takes long time on matching and decrease the efficiency of system. This work proposed the recognition system based on 2D SURF feature point and 3D point cloud. Divide 3D object that need to recognition into thirty-two orientation, each orientation interval of 11.25 degree. Using SURF (Speeded-Up Robust Features) algorithm to extract the keypoint for the image of object and store to the database. In the aspect of the correspondent angle object point cloud, calculate the information including normal, 3D keypoints needed for recognition. Due to 2D image recognition is faster than 3D, before execution 3D point cloud matching, the system first will use 2D SURF feature match with database object and high similarity object of pending test first, than use 3D point cloud to confirm. This process can significantly reduce the number of object, which need to be compared, also reduce the time on 3D matching. It can achieve the purpose of identifying the multi-objects without losing the identification efficiency and accuracy. As the experimental results of special testing object, the average of the 3D match process can exclude more than 88% of the dissimilar objects, and the average object identification accuracy is 80% or more.
APA, Harvard, Vancouver, ISO, and other styles
40

HUANG, WEI-SHIANG, and 黃暐翔. "Development of Software System for Autonomous Object Operations by 6-axis Robot Arm with 2D/3D Vison Capabilities." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/m95y7h.

Full text
Abstract:
碩士
國立臺灣科技大學
機械工程系
107
This research proposes a software system architecture suitable for amplification and growth for the fully autonomous robot arm object operating system. The autonomous intelligent object operating system needs to integrate with various intelligent vision systems for identifying and locating the object, and controlling the movement of the robot arm. However, the existing intelligent vision systems are numerous, and each has its scope of application. Moreover, more new visual algorithms and robot object operation modules are being continuously developed. A practical robot arm operating software system must be able to update and add new technical modules. Therefore, for this issue, this research designed a scalable and adapt-for-growth software system architecture. By preplanning the specified working directories and the specified specifications of subprogram, the user can easily replace the recognition codes without rewriting the main program. This research combines the robot arm control with a variety of 2D/3D vision systems to develop a smart robotic fully-automatic object operating system that can perform object recognition and 6D pose estimation of object in cluttered scenes by a 3D camera, control the robot arm to move to the preset grabbing point of the object, use the 2D camera at the end of the arm to manipulate the visual servo control based on the landmark on the object, and finally move the robot arm to the precise grabbing point and grab the object. The system also registers the applicable environment and reliability of different intelligent vision modules for each of the object in data set, and increases the robustness of the robot arm system in response to environmental variation. This research also examines the capabilities of the system through multiple experiments and carefully explores future directions for improvement.
APA, Harvard, Vancouver, ISO, and other styles
41

Yu, Ming-Jyun, and 余明駿. "Markers Based 3D Position Estimation for Rod Shaped Object Using 2D Image and Its Application In Endoscopic MIS Instrument Tracking." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/mwg4jq.

Full text
Abstract:
碩士
國立雲林科技大學
電機工程系
103
This aim of our research is to use a uniform circular rod-shaped object (such as the endoscopic surgical instruments), it is under the single-lens cameras shooting. We were labeled two markers on the rod-shaped object. All markers have the same shape, but the color is in different. Base on the digital image processing we can detect these markers and estimate 2D information of the rod-shaped object more efficiently. We can estimate 3D position information of the rod-shaped object more quickly, through the lens mapping on the camera sensor imaging positions geometric relationships. The 3D position information of the rod-shaped object total have seven parameters. Parameters are 3D of coordinate (X,Y,Z) and In-plane, out-plane angles (alpha,beta,gamma). We propose using the binary encoding to estimate rotation angle(theta).
APA, Harvard, Vancouver, ISO, and other styles
42

Chen, Kuan-Chieh, and 陳冠傑. "A Study on Autonomous Vehicle Navigation by 2D Object Image Matching and 3D Computer Vision Analysis for Indoor Security Patrolling Applications." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/34337662999305229023.

Full text
Abstract:
碩士
國立交通大學
多媒體工程研究所
96
A vision-based vehicle system for security patrolling in indoor environments using an autonomous vehicle is proposed. A small vehicle with wireless control and a web camera which has the capabilities of panning, tilting, and zooming is used as a test bed. At first, an easy-to-use learning technique is proposed, which has the capability of extracting specific features, including navigation path, floor color, monitored object, and vehicle location with respect to monitored objects. Next, a security patrolling method by vehicle navigation with obstacle avoidance and security monitoring capabilities is proposed. The vehicle navigates according to the node data of the path map which is created in the learning phase and monitors concerned objects by a simplified scale-invariant feature transform (simplified-SIFT) algorithm proposed in this study. Accordingly, we can extract the features of each monitored object from acquired images and match them with the corresponding learned data by the Hough transform. Furthermore, a vehicle location estimation technique for path correction utilizing the monitored object matching result is proposed. In addition, techniques for obstacle avoidance are also proposed, which can be used to find the clusters of floor colors, detect obstacles in environments with various floor colors, and integrate a technique of goal-directed minimum path following to guide the vehicle to avoid obstacles. Good experimental results show the flexibility and feasibility of the proposed methods for the application of security patrolling in indoor environments.
APA, Harvard, Vancouver, ISO, and other styles
43

SYU, JIA-CYUAN, and 許家銓. "Double-Rings Markers Based 3D Complete Eight Quadrants Position Estimation for Rod Shaped Object Using 2D Image and Its Application In Endoscopic MIS Instrument Tracking." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/92550082240203623869.

Full text
Abstract:
碩士
國立雲林科技大學
電機工程系
104
This paper is based on the results of Mr. Ming-Jyun Yu’s master thesis from our laborotary in 2015 titled “Markers Based 3D Position Estimation for Rod Shaped Object Using 2D Image and Its Application In Endoscopic MIS Instrument Tracking” which features fast and accurate estimation of the six 3D position parameters using just a single 2D image through a set of deterministic formulars (equations). In this thesis, we completed four research goals. Firstly, we extend the original formular for one particular pose to formulars for any pose. Secondly, we select the colors of the two rings as well as the RGB thresholds for fast and accurate ring shape extraction from the 2D image by analyzing a large amount of laparoscopic images. Thirdly, we propose an algorithm with a 2D laparoscopic image as the input and the corresponding six (6) 3D pose parameters as the output. We aslo verify the correctness of the proposed formulars and algorithm by conducting extensive experiments to measure (by human observers) and estimate (by proposed algorithm) the six 3D parameters and analyze their differences for various poses. The results of analysis can be used for further accuracy improvements for the propsed method. Finally, we demonstrate the possibility of sychronizing the motion of a real rod-shaped body and its Unity3D based 3D model for further application in Augmented Reality. The six 3D pose parameters of a real rod-shaped body is estimated by the proposed system and transmitted to drive its 3D model in the remote. Compared to other existing MIS pose estimation methods, the proposed Double-Ring marker based algorithm is accurate and computationally very efficient.
APA, Harvard, Vancouver, ISO, and other styles
44

Liu, Cheng Hsiung, and 劉政雄. "RECOGNITION OF 3D OBJECTS BY SINGLE CAMERA VIEWS USING CAMERA CALIBRATION, SURFACE BACKPROJECTION, AND 2D MODEL MATCHING TECHNIQUES BASED ON OBJECT SHAPE AND SURFACE PATTERN INFORMATION." Thesis, 1993. http://ndltd.ncl.edu.tw/handle/99374905602646662157.

Full text
Abstract:
博士
國立交通大學
資訊工程研究所
81
A new approach to recognition of three different classes of 3D objects by single camera views using a combination of camera calibration, surface backprojection, and 2D model matching techniques are proposed. The three classes of 3D objects are cuboids, cylinders, and regular prisms, which are commonly seen in commercial products and industrial parts. Not only the silhouette shape but also the surface pattern of the object are utilized in the recognition scheme. For each class, objects of both different sizes and different surface patterns can be recognized. To recognize an input object of each class, a new camera calibration technique is first employed to compute the camera parameters as well as the object dimension parameters analytically using a single camera view of the object. The availability of the analytical solutions of the camera parameters makes the proposed technique faster in parameter computation than other camera calibration approaches requiring iterative parameter computation processes. The calibration technique is based on the use of the information of the lines or curves formed by the intersections of the object surfaces. A surface backprojection technique is then adopted to reconstruct the pattern on each surface patch of the input object. This technique transforms the 3D surface data into a set of 2D surface patch patterns, which make the subsequent model matching process becomes 2D in nature. Finally, in the model matching process, each surface patch pattern is matched with those of each object model using the distance weighted correlation measure. Experimental results show the feasibility of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography