Academic literature on the topic '2D/3D object discovery'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic '2D/3D object discovery.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Dissertations / Theses on the topic "2D/3D object discovery"

1

Kara, Sandra. "Unsupervised object discovery in images and video data." Electronic Thesis or Diss., université Paris-Saclay, 2025. http://www.theses.fr/2025UPASG019.

Full text
Abstract:
Cette thèse explore les méthodes d'apprentissage auto-supervisé pour la localisation d'objets, communément appelées « Object Discovery ». La localisation d'objets dans les images et les vidéos est un élément essentiel des tâches de vision par ordinateur telles que la détection, la ré-identification, le suivi, etc. Les algorithmes supervisés actuels peuvent localiser (et classifier) les objets avec précision, mais ils sont coûteux en raison de la nécessité de données annotées. Le processus d'étiquetage est généralement répété pour chaque nouvelle donnée ou catégorie d'intérêt, limitant ainsi leur évolutivité. De plus, les approches sémantiquement spécialisées nécessitent une connaissance préalable des classes cibles, restreignant leur utilisation aux objets connus. La découverte d'objets vise à pallier ces limitations en étant plus générique. La première contribution de la thèse s'est concentrée sur la modalité image, en étudiant comment les caractéristiques des modèles transformers de vision auto-supervisés peuvent servir d'indices pour la découverte d'objets multiples. Afin de localiser les objets dans leur définition la plus large, nous avons étendu notre étude aux données vidéo, en exploitant les indices de mouvement et en ciblant la localisation d'objets capables de se déplacer. Nous avons introduit la modélisation de l'arrière-plan et la distillation de connaissances dans la découverte d'objets pour résoudre le problème de la sur-segmentation de l'arrière-plan dans les méthodes existantes, et pour réintégrer les objets statiques, améliorant ainsi de manière significative le rapport signal/bruit dans les prédictions. Reconnaissant les limites des données à modalité unique, nous avons incorporé des données 3D à travers un apprentissage par distillation de connaissances cross-modale. L'échange de connaissances entre les domaines 2D et 3D a permis d'améliorer l'alignement des régions d'objets entre les deux modalités, rendant possible l'utilisation de la cohérence multi-modale comme critère de confiance<br>This thesis explores self-supervised learning methods for object localization, commonly known as Object Discovery. Object localization in images and videos is an essential component of computer vision tasks such as detection, re-identification, tracking etc. Current supervised algorithms can localize (and classify) objects accurately but are costly due to the need for annotated data. The process of labeling is typically repeated for each new data or category of interest, limiting their scalability. Additionally, the semantically specialized approaches require prior knowledge of the target classes, restricting their use to known objects. Object Discovery aims to address these limitations by being more generic. The first contribution of this thesis focused on the image modality, investigating how features from self-supervised vision transformers can serve as cues for multi-object discovery. To localize objects in their broadest definition, we extended our focus to video data, leveraging motion cues and targeting the localization of objects that can move. We introduced background modeling and knowledge distillation in object discovery to tackle the background over-segmentation issue in existing object discovery methods and to reintegrate static objects, significantly improving the signal-to-noise ratio in predictions. Recognizing the limitations of single-modality data, we incorporated 3D data through a cross-modal distillation framework. The knowledge exchange between 2D and 3D domains improved alignment on object regions between the two modalities, enabling the use of multi-modal consistency as a confidence criterion
APA, Harvard, Vancouver, ISO, and other styles
2

Shao, Zhimin. "3D/2D object recognition from surface patterns." Thesis, University of Surrey, 1997. http://epubs.surrey.ac.uk/844055/.

Full text
Abstract:
Attributed Relational Graph (ARG) is a powerful representation for model based object recognition due to its inherent robustness in handling noisy and incomplete data. In the past few years, the availability of efficient ARG matching algorithms and their theoretical underpinnings have greatly contributed to many successful applications of ARG representation in tackling high level vision problems. During my past three year investigation into object recognition using ARG representation, we have developed a number of novel theories and techniques in the subject area. Some are image processing techniques which help to segment and generate primitive features for building ARG representation (Chapter 2 and 4). Some are about projective invariance in ARG representations (Chapter 3 and 5). Some are about new ARG matching algorithms (Chapter 6). This thesis serves as a summary document of these theories and techniques. The most important contributions of our work to the domain of computer vision, in my opinion, are in two areas: Firstly, in the area of projective invariant ARG representation for object recognition. Here, we demonstrated for the first time, a way to systematically derive ARG representation for objects under complex projective transform by exploiting the knowledge of invariance. The methodology developed by us is a sound strategy that generates ARG representations with a number of desirable and provable properties, amongst which, the most important one is the ability to capture global transformation constraint using binary relations only. The approach significantly reduces the heuristic nature of designing relational measurements and paves the way for wider application of ARG representation in 2D and 3D object recognition. Secondly, in the area of ARG matching. A new mathematical framework for deterministic relaxation algorithms was developed to overcome a number of problems appeared in the existing theories and practises of efficient ARG labelling. A novel labelling algorithm was proposed based on the new theoretical framework. The algorithm has a number of desirable properties compared to existing algorithms. In particular, the resulting algorithm delivers more consistent, faithful-to-observation results in the presence of ambiguities and multiple interpretations compared to other algorithms.
APA, Harvard, Vancouver, ISO, and other styles
3

Sirtkaya, Salim. "Moving Object Detction In 2d And 3d Scenes." Master's thesis, METU, 2004. http://etd.lib.metu.edu.tr/upload/2/12605310/index.pdf.

Full text
Abstract:
This thesis describes the theoretical bases, development and testing of an integrated moving object detection framework in 2D and 3D scenes. The detection problem is analyzed in stationary and non-stationary camera sequences and different algorithms are developed for each case. Two methods are proposed in stationary camera sequences: background extraction followed by differencing and thresholding, and motion detection using optical flow field calculated by &ldquo<br>Kanade-Lucas Feature Tracker&rdquo<br>. For non-stationary camera sequences, different algorithms are developed based on the scene structure and camera motion characteristics. In planar scenes where the scene is flat or distant from the camera and/or when camera makes rotations only, a method is proposed that uses 2D parametric registration based on affine parameters of the dominant plane for independently moving object detection. A modified version of the 2D parametric registration approach is used when the scene is not planar but consists of a few number of planes at different depths, and camera makes translational motion. Optical flow field segmentation and sequential registration are the key points for this case. For 3D scenes, where the depth variation within the scene is high, a parallax rigidity based approach is developed for moving object detection. All these algorithms are integrated to form a unified independently moving object detector that works in stationary and non-stationary camera sequences and with different scene and camera motion structures. Optical flow field estimation and segmentation is used for this purpose.
APA, Harvard, Vancouver, ISO, and other styles
4

Toth, Levente. "3D object recognition based on constrained 2D views." Thesis, University of Plymouth, 1998. http://hdl.handle.net/10026.1/1808.

Full text
Abstract:
The aim of the present work was to build a novel 3D object recognition system capable of classifying man-made and natural objects based on single 2D views. The approach to this problem has been one motivated by recent theories on biological vision and multiresolution analysis. The project's objectives were the implementation of a system that is able to deal with simple 3D scenes and constitutes an engineering solution to the problem of 3D object recognition, allowing the proposed recognition system to operate in a practically acceptable time frame. The developed system takes further the work on automatic classification of marine phytoplanktons, carried out at the Centre for Intelligent Systems, University of Plymouth. The thesis discusses the main theoretical issues that prompted the fundamental system design options. The principles and the implementation of the coarse data channels used in the system are described. A new multiresolution representation of 2D views is presented, which provides the classifier module of the system with coarse-coded descriptions of the scale-space distribution of potentially interesting features. A multiresolution analysis-based mechanism is proposed, which directs the system's attention towards potentially salient features. Unsupervised similarity-based feature grouping is introduced, which is used in coarse data channels to yield feature signatures that are not spatially coherent and provide the classifier module with salient descriptions of object views. A simple texture descriptor is described, which is based on properties of a special wavelet transform. The system has been tested on computer-generated and natural image data sets, in conditions where the inter-object similarity was monitored and quantitatively assessed by human subjects, or the analysed objects were very similar and their discrimination constituted a difficult task even for human experts. The validity of the above described approaches has been proven. The studies conducted with various statistical and artificial neural network-based classifiers have shown that the system is able to perform well in all of the above mentioned situations. These investigations also made possible to take further and generalise a number of important conclusions drawn during previous work carried out in the field of 2D shape (plankton) recognition, regarding the behaviour of multiple coarse data channels-based pattern recognition systems and various classifier architectures. The system possesses the ability of dealing with difficult field-collected images of objects and the techniques employed by its component modules make possible its extension to the domain of complex multiple-object 3D scene recognition. The system is expected to find immediate applicability in the field of marine biota classification.
APA, Harvard, Vancouver, ISO, and other styles
5

Govender, Natasha. "Active object recognition for 2D and 3D applications." Doctoral thesis, University of Cape Town, 2015. http://hdl.handle.net/11427/16520.

Full text
Abstract:
Includes bibliographical references<br>Active object recognition provides a mechanism for selecting informative viewpoints to complete recognition tasks as quickly and accurately as possible. One can manipulate the position of the camera or the object of interest to obtain more useful information. This approach can improve the computational efficiency of the recognition task by only processing viewpoints selected based on the amount of relevant information they contain. Active object recognition methods are based around how to select the next best viewpoint and the integration of the extracted information. Most active recognition methods do not use local interest points which have been shown to work well in other recognition tasks and are tested on images containing a single object with no occlusions or clutter. In this thesis we investigate using local interest points (SIFT) in probabilistic and non-probabilistic settings for active single and multiple object and viewpoint/pose recognition. Test images used contain objects that are occluded and occur in significant clutter. Visually similar objects are also included in our dataset. Initially we introduce a non-probabilistic 3D active object recognition system which consists of a mechanism for selecting the next best viewpoint and an integration strategy to provide feedback to the system. A novel approach to weighting the uniqueness of features extracted is presented, using a vocabulary tree data structure. This process is then used to determine the next best viewpoint by selecting the one with the highest number of unique features. A Bayesian framework uses the modified statistics from the vocabulary structure to update the system's confidence in the identity of the object. New test images are only captured when the belief hypothesis is below a predefined threshold. This vocabulary tree method is tested against randomly selecting the next viewpoint and a state-of-the-art active object recognition method by Kootstra et al.. Our approach outperforms both methods by correctly recognizing more objects with less computational expense. This vocabulary tree method is extended for use in a probabilistic setting to improve the object recognition accuracy. We introduce Bayesian approaches for object recognition and object and pose recognition. Three likelihood models are introduced which incorporate various parameters and levels of complexity. The occlusion model, which includes geometric information and variables that cater for the background distribution and occlusion, correctly recognizes all objects on our challenging database. This probabilistic approach is further extended for recognizing multiple objects and poses in a test images. We show through experiments that this model can recognize multiple objects which occur in close proximity to distractor objects. Our viewpoint selection strategy is also extended to the multiple object application and performs well when compared to randomly selecting the next viewpoint, the activation model and mutual information. We also study the impact of using active vision for shape recognition. Fourier descriptors are used as input to our shape recognition system with mutual information as the active vision component. We build multinomial and Gaussian distributions using this information, which correctly recognizes a sequence of objects. We demonstrate the effectiveness of active vision in object recognition systems. We show that even in different recognition applications using different low level inputs, incorporating active vision improves the overall accuracy and decreases the computational expense of object recognition systems.
APA, Harvard, Vancouver, ISO, and other styles
6

Noé, Estelle. "3D layered articulated object from a single 2D drawing." Thesis, KTH, Medieteknik och interaktionsdesign, MID, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-216943.

Full text
Abstract:
Modeling articulated objects made of rigid layered parts used to populate 3D scenes in video games or movie production is a complex and time-consuming task for digital artists. This work proposes a sketch-based approach to efficiently model 3D layered articulated objects, such as animals with rigid shells and armors, in annotating a single 2D photo manually, and eventually fabricate it from automatically computed 2D patterns. In considering symmetrical objects seen under a 3/4 view, and an- notating salient features such as extremities of the rigid articulated parts as a mix of circular and Bézier curve, this approach is able to retrieve depth information, hidden parts, and rotation-articulated structure. The resulting shape consists of a set of quadrangulated polygons that may be flattened in 2D. Details such as ears, tails, and legs were further models using dedicated annotations. The accuracy of the reconstruction has been validated on synthetic cylindrical examples, and its ro- bustness in reconstructing a 3D model of armor, armadillo, and shrimp. The latter was finally fabricated using paper.<br>Att modellera artikulerade objekt gjorda av styva delar lagda i lager som används till att fylla 3D-scener i datorspel och filmskapande är en komplex och tidsödande uppgift för digitala konstnärer. Den här undersökningen föreslår ett skiss-baserat tillvägagångssätt att effektivt modellera artikulerade 3D-objekt lagda i lager, såsom djur med styva skal och rustning, i att annotera ett 2D-foto manuellt, och eventuellt skapa det från automatiskt beräknade 2D-mönster. Hänsyn är tagen till symmetriska objekt sedda under en 3/4 vy, och annotera framträdande egenskapersåsom extremiteter av de styva artikulerade delarna som en blandning avcirkulära och Bézier-kurvor, kan det här tillvägagångssättet hämta information om djup, gömda delar och rotations-artikulerade strukturer. Den slutliga formen består av ett set av fyrsidiga polygoner som kan bli tillplattade i 2D. Detaljer såsom öron, svansar och ben där framtida modeller använder dedikerade annotationer. Noggrannheten av rekonstruktionen har blivit validerad på syntetiska cylindriska exempeloch dess robusthet i att rekonstruera en 3D-modell av en rustning, ett bältdjur och en räka. Den senare skapades slutligen med hjälp av papper.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhu, Yonggen. "Feature extraction and 2D/3D object recognition using geometric invariants." Thesis, King's College London (University of London), 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.362731.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gamal, Eldin Ahmed. "Point process and graph cut applied to 2D and 3D object extraction." Nice, 2011. http://www.theses.fr/2011NICE4107.

Full text
Abstract:
L’objectif de cette thèse est de développer une nouvelle approche de détection d’objets 3D à partir d’une image 2D, prenant en compte les occultations et les phénomènes de perspective. Cette approche est fondée sur la théorie des processus ponctuels marqués, qui a fait ses preuves dans la solution de plusieurs problèmes en imagerie haute résolution. Le travail de la thèse est structuré en deux parties. Dans la première partie, nous proposons une nouvelle méthode probabiliste pour gérer les occultations et les effets de perspective. Le modèle proposé est fondé sur la simulation d’une scène 3D utilisant OpenGL sur une carte graphique (GPU). C’est une méthode orientée objet, intégrée dans le cadre d’un processus ponctuel marqué. Nous l’appliquons pour l’estimation de la taille d’une colonie de manchots, là où nous modélisons certaines configurations candidat composé d’objet 3D s’appuyant sur le plan réel. Une densité de Gibbs est définie sur l’espace des configurations, qui prend en compte des informations a priori et sur les données. Pour une configuration proposée, la scène est projetée sur le plan image, et les configurations sont modifiées jusqu’à convergence. Pour évaluer une configuration proposée, nous mesurons la similarité entre l’image projetée de la configuration projetée et l’image réelle, définissant ainsi le terme d’attache aux données et l’a priori pénalisant les recouvrements entre objets. Nous avons introduit des modifications dans l’algorithme d’optimisation pour prendre en compte les nouvelles dépendances qui existent dans notre modèle 3D. Nous proposons une nouvelle méthode d’optimisation appelée « Naissances et Coupe multiples » (Multiple Births and Cut » (MBC) en anglais). Cette méthode combine à la fois la nouvelle méthode d’optimisation « Naissance et mort multiples » (MBD) et les « Graph-Cut ». Les méthodes MBC et MBD sont utilisées pour l’optimisation d’un processus ponctuel marqué. Nous avons comparé les algorithmes MBC et MBD montrant que les principaux avantages de notre algorithme nouvellement proposé sont la réduction du nombre de paramètres, la vitesse de convergence et de la qualité des résultats obtenus. Nous avons validé notre algorithme sur le problème de dénombrement des flamants roses dans une colonie<br>The topic of this thesis is to develop a novel approach for 3D object detection from a 2D image. This approach takes into consideration the occlusions and the perspective effects. This work has been embedded in a marked point process framework, proved to be efficient for solving many challenging problems dealing with high resolution images. The accomplished work during the thesis can be presented in two parts : In the first part, we propose a novel probabilistic approach to handle occlusions and perspective effects. The proposed method is based on 3D scene simulation on the GPU using OpenGL. It is an object based method embedded in a marked point process framework. We apply it for the size estimation of a penguin colony, where we model a penguin colony as an unknown number of 3D objects. The main idea of the proposed approach is to sample some candidate configurations consisting of 3D objects lying on the real plane. A Gibbs energy is define on the configuration space, which takes into account both prior and data information. The proposed configurations are projected onto the image plane, and the configurations are modified until convergence. To evaluate a proposed configuration, we measure the similarity between the projected image of the proposed configuration and the real image, by defining a data term and a prior term which penalize objects overlapping. We introduced modifications to the optimization algorithm to take into account new dependencies that exists in our 3D model. In the second part, we propose a new optimization method which we call “Multiple Births and Cut” (MBC). It combines the recently developed optimization algorithm Multiple Births and Deaths (MBD) and the Graph-Cut. MBD and MBC optimization methods are applied for the optimization of a marked point process. We compared the MBC to the MBD algorithms showing that the main advantage of our newly proposed algorithm is the reduction of the number of parameters, the speed of convergence and the quality of the obtained results. We validated our algorithm on the counting problem of flamingos in a colony
APA, Harvard, Vancouver, ISO, and other styles
9

Gomez-Donoso, Francisco. "Contributions to 3D object recognition and 3D hand pose estimation using deep learning techniques." Doctoral thesis, Universidad de Alicante, 2020. http://hdl.handle.net/10045/110658.

Full text
Abstract:
In this thesis, a study of two blooming fields in the artificial intelligence topic is carried out. The first part of the present document is about 3D object recognition methods. Object recognition in general is about providing the ability to understand what objects appears in the input data of an intelligent system. Any robot, from industrial robots to social robots, could benefit of such capability to improve its performance and carry out high level tasks. In fact, this topic has been largely studied and some object recognition methods present in the state of the art outperform humans in terms of accuracy. Nonetheless, these methods are image-based, namely, they focus in recognizing visual features. This could be a problem in some contexts as there exist objects that look alike some other, different objects. For instance, a social robot that recognizes a face in a picture, or an intelligent car that recognizes a pedestrian in a billboard. A potential solution for this issue would be involving tridimensional data so that the systems would not focus on visual features but topological features. Thus, in this thesis, a study of 3D object recognition methods is carried out. The approaches proposed in this document, which take advantage of deep learning methods, take as an input point clouds and are able to provide the correct category. We evaluated the proposals with a range of public challenges, datasets and real life data with high success. The second part of the thesis is about hand pose estimation. This is also an interesting topic that focuses in providing the hand's kinematics. A range of systems, from human computer interaction and virtual reality to social robots could benefit of such capability. For instance to interface a computer and control it with seamless hand gestures or to interact with a social robot that is able to understand human non-verbal communication methods. Thus, in the present document, hand pose estimation approaches are proposed. It is worth noting that the proposals take as an input color images and are able to provide 2D and 3D hand pose in the image plane and euclidean coordinate frames. Specifically, the hand poses are encoded in a collection of points that represents the joints in a hand, so that they can be easily reconstructed in the full hand pose. The methods are evaluated on custom and public datasets, and integrated with a robotic hand teleoperation application with great success.
APA, Harvard, Vancouver, ISO, and other styles
10

Sambra-Petre, Raluca-Diana. "2D/3D knowledge inference for intelligent access to enriched visual content." Phd thesis, Institut National des Télécommunications, 2013. http://tel.archives-ouvertes.fr/tel-00917972.

Full text
Abstract:
This Ph.D. thesis tackles the issue of sill and video object categorization. The objective is to associate semantic labels to 2D objects present in natural images/videos. The principle of the proposed approach consists of exploiting categorized 3D model repositories in order to identify unknown 2D objects based on 2D/3D matching techniques. We propose here an object recognition framework, designed to work for real time applications. The similarity between classified 3D models and unknown 2D content is evaluated with the help of the 2D/3D description. A voting procedure is further employed in order to determine the most probable categories of the 2D object. A representative viewing angle selection strategy and a new contour based descriptor (so-called AH), are proposed. The experimental evaluation proved that, by employing the intelligent selection of views, the number of projections can be decreased significantly (up to 5 times) while obtaining similar performance. The results have also shown the superiority of AH with respect to other state of the art descriptors. An objective evaluation of the intra and inter class variability of the 3D model repositories involved in this work is also proposed, together with a comparative study of the retained indexing approaches . An interactive, scribble-based segmentation approach is also introduced. The proposed method is specifically designed to overcome compression artefacts such as those introduced by JPEG compression. We finally present an indexing/retrieval/classification Web platform, so-called Diana, which integrates the various methodologies employed in this thesis
APA, Harvard, Vancouver, ISO, and other styles
More sources
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography