Dissertationen zum Thema „Computer vision“

Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Computer vision.

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Dissertationen für die Forschung zum Thema "Computer vision" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Dissertationen für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Revell, James Duncan. „Computer vision elastography“. Thesis, University of Bristol, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.412361.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Chiu, Kevin (Kevin Geeyoung). „Vision on tap : an online computer vision toolkit“. Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/67714.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2011.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student submitted PDF version of thesis.
Includes bibliographical references (p. 60-64).
In this thesis, we present an online toolkit, based on a combination of a Scratch-based programming environment and computer vision libraries, manifested as blocks within the environment, integrated with a community platform for diffusing advances in computer vision to a general populace. We show that by providing these tools, non-developers are able to create and publish computer vision applications. The visual development environment includes a collection of algorithms that, despite being well known in the computer vision community, provide capabilities to commodity cameras that are not yet common knowledge. In support of this visual development environment, we also present an online community that allows users to share applications made in the environment, assisting the dissemination of both the knowledge of camera capabilities and advanced camera capabilities to users who have not yet been exposed to their existence or comfortable with their use. Initial evaluations consist of user studies that quantify the abilities afforded to the novice computer vision users by the toolkit, baselined against experienced computer vision users.
by Kevin Chiu.
S.M.
3

Rihan, Jonathan. „Computer vision based interfaces for computer games“. Thesis, Oxford Brookes University, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.579554.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Interacting with a computer game using only a simple web camera has seen a great deal of success in the computer games industry, as demonstrated by the numerous computer vision based games available for the Sony PlayStation 2 and PlayStation 3 game consoles. Computational efficiency is important for these human computer inter- action applications, so for simple interactions a fast background subtraction approach is used that incorporates a new local descriptor which uses a novel temporal coding scheme that is much more robust to noise than the standard formulations. Results are presented that demonstrate the effect of using this method for code label stability. Detecting local image changes is sufficient for basic interactions, but exploiting high-level information about the player's actions, such as detecting the location of the player's head, the player's body, or ideally the player's pose, could be used as a cue to provide more complex interactions. Following an object detection approach to this problem, a combined detection and segmentation approach is explored that uses a face detection algorithm to initialise simple shape priors to demonstrate that good real-time performance can be achieved for face texture segmentation. Ultimately, knowing the player's pose solves many of the problems encountered by simple local image feature based methods, but is a difficult and non-trivial problem. A detection approach is also taken to pose estimation: first as a binary class problem for human detection, and then as a multi-class problem for combined localisation and pose detection. For human detection, a novel formulation of the standard chamfer matching algo- rithm as an SVM classifier is proposed that allows shape template weights to be learnt automatically. This allows templates to be learnt directly from training data even in the presence of background and without the need to pre-process the images to extract their silhouettes. Good results are achieved when compared to a state of the art human detection classifier. For combined pose detection and localisation, a novel and scalable method of ex- ploiting the edge distribution in aligned training images is presented to select the most potentially discriminative locations for local descriptors that allows a much higher space of descriptor configurations to be utilised efficiently. Results are presented that show competitive performance when compared to other combined localisation and pose detection methods.
4

Klomark, Marcus. „Occupant Detection using Computer Vision“. Thesis, Linköping University, Linköping University, Computer Vision, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54363.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:

The purpose of this master’s thesis was to study the possibility to use computer vision methods to detect and classify objects in the front passenger seat in a car. This work presents different approaches to solve this problem and evaluates the usefulness of each technique. The classification information should later be used to modulate the speed and the force of the airbag, to be able to provide each occupant with optimal protection and safety.

This work shows that computer vision has a great potential in order to provide data, which may be used to perform reliable occupant classification. Future choice of method to use depends on many factors, for example costs and requirements on the system from laws and car manufacturers. Further, evaluation and tests of the methods in this thesis, other methods, the ABE approach and post-processing of the results should also be made before a reliable classification algorithm may be written.

5

Purdy, Eric. „Grammatical methods in computer vision“. Thesis, The University of Chicago, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3557428.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:

In computer vision, grammatical models are models that represent objects hierarchically as compositions of sub-objects. This allows us to specify rich object models in a standard Bayesian probabilistic framework. In this thesis, we formulate shape grammars, a probabilistic model of curve formation that allows for both continuous variation and structural variation. We derive an EM-based training algorithm for shape grammars. We demonstrate the effectiveness of shape grammars for modeling human silhouettes, and also demonstrate their effectiveness in classifying curves by shape. We also give a general method for heuristically speeding up a large class of dynamic programming algorithms. We provide a general framework for discussing coarse-to-fine search strategies, and provide proofs of correctness. Our method can also be used with inadmissible heuristics.

Finally, we give an algorithm for doing approximate context-free parsing of long strings in linear time. We define a notion of approximate parsing in terms of restricted families of decompositions, and construct small families which can approximate arbitrary parses.

6

Newman, Rhys A. „Automatic learning in computer vision“. Thesis, University of Oxford, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.390526.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Crossley, Simon. „Robust temporal stereo computer vision“. Thesis, University of Sheffield, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.327614.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Fletcher, Gordon James. „Geometrical problems in computer vision“. Thesis, University of Liverpool, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.337166.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Mirmehdi, Majid. „Transputer configurations for computer vision“. Thesis, City University London, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.292339.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Hovhannisyan, Vahan. „Multilevel optimisation for computer vision“. Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/55874.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
The recent spark in machine learning and computer vision methods requiring increasingly larger datasets has motivated the introduction of optimisation algorithms specifically tailored to solve very large problems within practical time constraints. This demand in algorithms challenges the practicability of state of the art methods requiring new approaches that can take advantage of not only the problem’s mathematical structure, but also its data structure. Fortunately, such structure is present in many computer vision applications, where the problems can be modelled with varying degrees of fidelity. This structure suggests using multiscale models and thus multilevel algorithms. The objective of this thesis is to develop, implement and test provably convergent multilevel optimisation algorithms for convex composite optimisation problems in general and its applications in computer vision in particular. Our first multilevel algorithm solves convex composite optimisation problem and it is most efficient particularly for the robust facial recognition task. The method uses concepts from proximal gradient, mirror descent and multilevel optimisation algorithms, thus we call it multilevel accelerated gradient mirror descent algorithm (MAGMA). We first show that MAGMA has the same theoretical convergence rate as the state of the art first order methods and has much lower per iteration complexity. Then we demonstrate its practical advantage on many facial recognition problems. The second part of the thesis introduces new multilevel procedure most appropriate for the robust PCA problems requiring iterative SVD computations. We propose to exploit the multiscale structure present in these problems by constructing lower dimensional matrices and use its singular values for each iteration of the optimisation procedure. We implement this approach on three different optimisation algorithms - inexact ALM, Frank-Wolfe Thresholding and non-convex alternating projections. In this case as well we show that these multilevel algorithms converge (to an exact or approximate) solution with the same convergence rate as their standard counterparts and test all three methods on numerous synthetic and real life problems demonstrating that the multilevel algorithms are not only much faster, but also solve problems that often cannot be solved by their standard counterparts.
11

Clayton, Tyler (Tyler T. ). „Motion tracking with computer vision“. Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/109687.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Thesis: S.B., Massachusetts Institute of Technology, Department of Mechanical Engineering, 2016.
Cataloged from PDF version of thesis.
Includes bibliographical references (page 27).
In the Mechatronics laboratory, work is being done to develop methods for robot collision avoidance. A vital component of the project is motion detection and tracking. Currently, 3d-imaging software and hardware are employed, but this technique carries the drawbacks of blind spots in the environment. Since the camera is placed directly above the robot, there are blind spots underneath the robot, which are a major problem. The idea is for the robot to work side-by-side to a human counterpart, which would allow for quicker assembly of parts. But, with the current visual system, the robot would be unable to detect limbs that may maneuver underneath its linkages. This is an obvious problem. In this thesis, an automated rotary vision system attachable to each linkage of the robot is being proposed. By attaching cameras directly to the robot, we will have the increased ability to eliminate blind spots and detect objects in the environment. The proposed assembly involves a four-piece clamp-on shaft collar. Two parts will clamp to the linkages while the other two clamp around enabling free rotation. In testing, this proposed solution was able to track and detect, but it has drawbacks of increased weight to linkages and speed of image processing. Suggestions for improving upon the device are outlined. Overall, this device shows much promise for the Optical Assembly Station.
by Tyler Clayton.
S.B.
12

Christie, Gordon A. „Computer Vision for Quarry Applications“. Thesis, Virginia Tech, 2013. http://hdl.handle.net/10919/42762.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
This thesis explores the use of computer vision to facilitate three different processes of a quarry's operation. The first is the blasting process. This is where operators determine where to drill in order to execute an efficient and safe blast. Having an operator manually determine the drilling angles and positions can lead to inefficient and dangerous blasts. By using two cameras, oriented vertically, and separated by a fixed baseline, Structure from Motion techniques can be used to create a scaled 3D model of a bench. This can then be analyzed to provide operators with borehole locations and drilling angles in relation to fixed reference targets. The second process explored is the crushing process, where the rocks pass through different crushers that reduce the rocks into smaller sizes. The crushed rocks are then dropped onto a moving conveyor belt. The maximum dimension of the rocks exiting the crushers should not exceed size thresholds that are specific to each crusher. This thesis presents a 2D vision system capable of estimating the size distribution of the rocks by attempting to segment the rocks in each image. The size distribution, based on the maximum dimension of each rock, is estimated by finding the maximum dimension in the image in pixels and converting that to inches. The third process of the quarry operations explored is where the final product is piled up to form stockpiles. For inventory purposes, operators often carry out a manual estimation of the size of a the stockpile. This thesis presents a vision system capable of providing a more accurate estimate for the size of the stockpile by using Structure from Motion techniques to create a 3D reconstruction. User interaction helps to find the points that are relevant to the stockpile in the resulting point cloud, which are then used to estimate the volume.
Master of Science
13

Anani-Manyo, Nina K. „Computer Vision and Building Envelopes“. Kent State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=kent1619539038754026.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
14

Chavali, Neelima. „Object Proposals in Computer Vision“. Thesis, Virginia Tech, 2015. http://hdl.handle.net/10919/56590.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Object recognition is a central problem in computer vision which deals with both localizing and identifying objects in images. Object proposals have recently become an important part of the object recognition process. Object proposals are algorithms used for localizing objects in images. This thesis is a study in object proposals and is composed of three parts. First, we present a new data-driven approach for generating object proposals. Second, we release a MATLAB library which can be used to generate object proposals using all the existing algorithms. The library can also be used for evaluating object proposals using the three most commonly used metrics. Finally, we identify previously unnoticed bias in the existing protocol for evaluating object proposals and propose ways to alleviate this bias.
Master of Science
15

Kileel, Joseph David. „Algebraic Geometry for Computer Vision“. Thesis, University of California, Berkeley, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10282753.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:

This thesis uses tools from algebraic geometry to solve problems about three-dimensional scene reconstruction. 3D reconstruction is a fundamental task in multiview geometry, a field of computer vision. Given images of a world scene, taken by cameras in unknown positions, how can we best build a 3D model for the scene? Novel results are obtained for various challenging minimal problems, which are important algorithmic routines in Random Sampling Consensus pipelines for reconstruction. These routines reduce overfitting when outliers are present in image data.

Our approach throughout is to formulate inverse problems as structured systems of polynomial equations, and then to exploit underlying geometry. We apply numerical algebraic geometry, commutative algebra and tropical geometry, and we derive new mathematical results in these fields. We present simulations on image data as well as an implementation of general-purpose homotopy-continuation software for implicitization in computational algebraic geometry.

Chapter 1 introduces some relevant computer vision. Chapters 2 and 3 are devoted to the recovery of camera positions from images. We resolve an open problem concerning two calibrated cameras raised by Sameer Agarwal, a vision expert at Google Research, by using the algebraic theory of Ulrich sheaves. This gives a robust test for identifying outliers in terms of spectral gaps. Next, we quantify the algebraic complexity for notorious poorly understood cases for three calibrated cameras. This is achieved by formulating in terms of structured linear sections of an explicit moduli space and then computing via homotopy-continuation. In Chapter 4, a new framework for modeling image distortion is proposed, based on lifting algebraic varieties in projective space to varieties in other toric varieties. We check that our formulation leads to faster and more stable solvers than the state of the art. Lastly, Chapter 5 concludes by studying possible pictures of simple objects, as varieties inside products of projective planes. In particular, this dissertation exhibits that algebro-geometric methods can actually be useful in practical settings.

16

Reading, Ivan Alaric Derrick. „Pedestrian detection by computer vision“. Thesis, Edinburgh Napier University, 1999. http://researchrepository.napier.ac.uk/Output/6915.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
This document describes work aimed at determining whether the detection, by computer vision, of pedestrians waiting at signal-controlled road crossings could be made sufficiently reliable and affordable, using currently available technology, so as to be suitable for widespread use in traffic control systems. The work starts by examining the need for pedestrian detection in traffic control systems and then goes onto look at the specific problems of applying a vision system to the detection task. The most important distinctive features of the pedestrian detection task addressed in this work are: • The operating conditions are an outdoor environment with no constraints on factors such as variation in illumination, presence of shadows and the effects of adverse weather. • Pedestrians may be moving or static and are not limited to certain orientations or to movement in a single direction. • The number of pedestrians to be monitored is not restricted such that the vision system must cope with the monitoring of multiple targets concurrently. • The background scene is complex and so contains image features that tend to distract a vision system from the successful detection of pedestrians. • Pedestrian attire is unconstrained so detection must occur even when details of pedestrian shape are hidden by items such as coats and hats. • The camera's position is such that assumptions commonly used by vision systems to avoid the effects of occlusion, perspective and viewpoint variation are not valid. •The implementation cost of the system, in moderate volumes, must be realistic for widespread installation. A review of relevant prior art in computer vision with respect to the above demands is presented. Thereafter techniques developed by the author to overcome these difficulties are developed and evaluated over an extensive test set of image sequences representative of the range of conditions found in the real world. The work has resulted in the development of a vision system which has been shown to attain a useful level of performance under a wide range of environmental and transportation conditions. This was achieved, in real-time, using low-cost processing and sensor components so demonstrating the viability of developing the results of this work into a practical detector.
17

Ryan, David Andrew. „Crowd monitoring using computer vision“. Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/65652/1/David_Ryan_Thesis.pdf.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Novel computer vision techniques have been developed for automatic monitoring of crowed environments such as airports, railway stations and shopping malls. Using video feeds from multiple cameras, the techniques enable crowd counting, crowd flow monitoring, queue monitoring and abnormal event detection. The outcome of the research is useful for surveillance applications and for obtaining operational metrics to improve business efficiency.
18

AHMED, WAQAR. „Collaborative Learning in Computer Vision“. Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1069010.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
The science of designing machines to extract meaningful information from digital images, videos, and other visual inputs is known as Computer Vision (CV). Deep learning algorithms cope CV problems by automatically learning task-specific features. Especially, Deep Neural Networks (DNNs) have become an essential component in CV solutions due to their ability to encode large amounts of data and capacity to manipulate billions of model parameters. Unlike machines, humans learn by rapidly constructing abstract models. This is undoubtedly due to the fact that good teachers supply their students with much more than just the correct answer; they also provide intuitive comments, comparisons, and explanations. In deep learning, the availability of such auxiliary information at training time (but not at test time) is referred to as learning by Privileged Information (PI). Typically, predictions (e.g., soft labels) produced by a bigger and better network teacher are used as structured knowledge to supervise the training of a smaller network student, helping the student network to generalize better than that trained from scratch. This dissertation focuses on the category of deep learning systems known as Collaborative Learning, where one DNN model helps other models or several models help each other during training to achieve strong generalization and thus high performance. The question we address here is thus the following: how can we take advantage of PI for training a deep learning model, knowing that, at test time, such PI might be missing? In this context, we introduce new methods to tackle several challenging real-world computer vision problems. First, we propose a method for model compression that leverages PI in a teacher-student framework along with customizable block-wise optimization for learning a target-specific lightweight structure of the neural network. In particular, the proposed resource-aware optimization is employed on suitable parts of the student network while respecting the expected resource budget (e.g., floating-point operations per inference and model parameters). In addition, soft predictions produced by the teacher network are leveraged as a source of PI, forcing the student to preserve baseline performance during network structure optimization. Second, we propose a multiple-model learning method for action recognition, specifically devised for challenging video footages in which actions are not explicitly visualized, but rather, only implicitly referred. We use such videos as stimuli and involve a large sample of subjects to collect a high-definition EEG and video dataset. Next, we employ collaborative learning in a multi-modal setting i.e., the EEG (teacher) model helps the video (student) model by distilling the knowledge (implicit meaning of visual stimuli) to it, sharply boosting the recognition performance. The goal of Unsupervised Domain Adaptation (UDA) methods is to use the labeled source together with the unlabeled target domain data to train a model that generalizes well on the target domain. In contrast, we cast UDA as a pseudo-label refinery problem in the challenging source-free scenario i.e., in cases where the source domain data is inaccessible during training. We propose Negative Ensemble Learning (NEL) technique, a unified method for adaptive noise filtering and progressive pseudo-label refinement. In particular, the ensemble members collaboratively learn with a Disjoint Set of Residual Labels, an outcome of the output prediction consensus, to refine the challenging noise associated with the inferred pseudo-labels. A single model trained with the refined pseudo-labels leads to superior performance on the target domain, without using source data samples at all. We conclude this dissertation with a method extending our previous study by incorporating Continual Learning in the Source-Free UDA. Our new method comprises of two stages: a Source-Free UDA pipeline based on pseudo-label refinement, and a procedure for extracting class-conditioned source-style images by leveraging the pre-trained source model. While stage 1 holds the same collaborative peculiarities, in stage 2, the collaboration exists in an indirect manner i.e., it is the source model that provides the only possibility to generate source-style synthetic images which eventually helps the final model in preserving good performance on both source and target domains. In each study, we consider heterogeneous CV tasks. Nevertheless, with an extensive pool of experiments on various benchmarks carrying diverse complexities and challenges, we show that the collaborative learning framework outperforms the related state-of-the-art methods by a considerable margin.
19

Douillard, Arthur. „Continual Learning for Computer Vision“. Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS165.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Depuis le début des années 2010 la recherche en apprentissage automatique a orienté son attention vers les efficaces réseaux de neurones profonds. Plus particulièrement, toutes les tâches de vision par ordinateur utilisent désormais des réseaux convolutionnels. Ces modèles apprennent à détecter des motifs d'abord simples (contours, textures) puis de plus en plus complexes jusqu'à apprendre le concept d'objets en particulier. Malgré les grandes avancées dans le domaine des réseaux de neurones profonds, un problème important subsiste : comment apprendre une quantité croissante de concepts, à la manière d'un élève durant sa scolarité, sans oublier les précédentes connaissances. Ce problème d'apprentissage continu est complexe : si non traité, les réseaux de neurones oublient catastrophiquement. L'objectif de cette thèse était donc de résoudre de ce problème. J'ai pu dans un premier temps développer plusieurs méthodes pour forcer un comportement similaire entre la version du modèle ayant appris de nouveaux concepts et sa précédente itération. Contrairement au reste de la littérature, qui imposait des contraintes sur le comportement final du modèle, je me suis intéressé aux représentations internes. Dans un second temps, j'ai considéré l'apprentissage continu pour la tâche de segmentation sémantique. Cette tâche complexe possède des problèmes inédits dans un contexte continu en plus de l'oubli catastrophique. J'ai pu proposer plusieurs approches complémentaires pour les résoudre. Plus précisément : une nouvelle méthode de contraintes, une technique de pseudo-annotations et une manière efficace de révisions d'objets. Et enfin, dans un troisième et dernier temps, je m'intéresse aux réseaux de neurones dynamiques,pouvant créer de nouveaux neurones à travers leur existence pour résoudre un nombre croissant de tâche. Les méthodes précédentes grandissent avec peu de contrôles, résultant en des modèles extrêmement lourd, et souvent aussi lents. Donc, en m'inspirant des récents transformers,j'ai conçu une stratégie dynamique avec un coût pratiquement nul, mais ayant malgré tout des performances à l'état-de-l'art
I first review the existing methods based on regularization for continual learning. While regularizing a model's probabilities is very efficient to reduce forgetting in large-scale datasets, there are few works considering constraints on intermediate features. I cover in this chapter two contributions aiming to regularize directly the latent space of ConvNet. The first one, PODNet, aims to reduce the drift of spatial statistics between the old and new model, which in effect reduces drastically forgetting of old classes while enabling efficient learning of new classes. I show in a second part a complementary method where we avoid pre-emptively forgetting by allocating locations in the latent space for yet unseen future class. Then, I describe a recent application of CIL to semantic segmentation. I show that the very nature of CSS offer new specific challenges, namely forgetting on large images and a background shift. We tackle the first problem by extending our distillation loss introduced in the previous chapter to multi-scales. The second problem is solved by an efficient pseudo-labeling strategy. Finally, we consider the common rehearsal learning, but applied this time to CSS. I show that it cannot be used naively because of memory complexity and design a light-weight rehearsal that is even more efficient. Finally, I consider a completely different approach to continual learning: dynamic networks where the parameters are extended during training to adapt to new tasks. Previous works on this domain are hard to train and often suffer from parameter count explosion. For the first time in continual computer vision, we propose to use the Transformer architecture: the model dimension mostly fixed and shared across tasks, except for an expansion of learned task tokens. With an encoder/decoder strategy where the decoder forward is specialized by a task token, we show state-of-the-art robustness to forgetting while our memory and computational complexities barely grow
20

Riba, Pi Edgar. „Geometric Computer Vision Techniques for Scene Reconstruction“. Doctoral thesis, Universitat Autònoma de Barcelona, 2021. http://hdl.handle.net/10803/671624.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Des dels inicis de la Visió per Computador, la reconstrucció d’escenes ha estat un dels temes més estudiats que ha portat a una àmplia varietat de nous descobriments i aplicacions. La manipulació d’objectes, la localització i mapeig, o fins i tot la generació d’efectes visuals són diferents exemples d’aplicacions en les que la reconstrucció d’escenes ha pres un paper important per a indústries com la robòtica, l’automatització de fàbriques o la producció audiovisual. No obstant això, la reconstrucció d’escenes és un tema extens que es pot abordar de moltes formes diferents amb solucions ja existents que funcionen de manera efectiva en entorns controlats. Formalment, el problema de la reconstrucció d’escenes pot formular-se com una seqüència de processos independents. En aquesta tesi, analitzem algunes parts de la seqüència de reconstrucció a partir de les quals contribuïm amb nous mètodes que fan servir Convolutional Neural Networks (CNN), proposant solucions innovadores que consideren l’optimització dels mètodes de forma conjunta. En primer lloc, revisem l’estat de l’art dels detectors i descriptors de característiques local clàssiques i contribuïm amb dos mètodes nous que milloren intrínsecament les solucions preexistents al problema de reconstrucció d’escenes. És un fet que la informàtica i l’enginyeria del software són dos camps que solen anar de la mà i evolucionen segons necessitats mútues facilitant el disseny d’algoritmes complexos i eficients. Per aquesta raó, contribuïm amb Kornia, un llibreria dissenyada específicament per treballar amb tècniques clàssiques de visió per computador conjuntament amb xarxes neuronals profundes. En essència, hem creat un marc que facilita el disseny de processos complexes per algoritmes de visió per computador perquè es puguin incloure dins les xarxes neuronals i usar-se per propagar gradients dins d’un marc d’optimització comú. Finalment, en l’últim capítol d’aquesta tesi desenvolupem el concepte abans esmentat de dissenyar sistemes de forma conjunta amb geometria projectiva clàssica. Per tant, proposem una solució a el problema de la generació de vistes sintètiques mitjançant l’al·lucinació de vistes noves d’objectes altament deformables utilitzant un sistema conjunt amb la geometria de l’escena. En resum, en aquesta tesi demostrem que amb un disseny adequat que combini els mètodes clàssics de visió geomètrica per computador amb tècniques d’aprenentatge profund pot conduir a la millora de solucions per al problema de la reconstrucció d’escenes.
Desde los inicios de la Visión por Computador, la reconstrucción de escenas ha sido uno de los temas más estudiados que ha llevado a una amplia variedad de nuevos descubrimientos y aplicaciones. La manipulación de objetos, la localización y mapeo, o incluso la generación de efectos visuales son diferentes ejemplos de aplicaciones en las que la reconstrucción de escenas ha tomado un papel importante para industrias como la robótica, la automatización de fábricas o la producción audiovisual. Sin embargo, la reconstrucción de escenas es un tema extenso que se puede abordar de muchas formas diferentes con soluciones ya existentes que funcionan de manera efectiva en entornos controlados. Formalmente, el problema de la reconstrucción de escenas puede formularse como una secuencia de procesos independientes. En esta tesis, analizamos algunas partes del pipeline de reconstrucción a partir de las cuales contribuimos con métodos novedosos utilizando Redes Neuronales Convolucionales (CNN) proponiendo soluciones innovadoras que consideran la optimización de los métodos de forma end-to-end. En primer lugar, revisamos el estado del arte de los detectores y descriptores de características locales clásicas y contribuimos con dos métodos novedosos que mejoran las soluciones preexistentes en el problema de reconstrucción de escenas. Es un hecho que la informática y la ingeniería de software son dos campos que suelen ir de la mano y evolucionan según necesidades mutuas facilitando el diseño de algoritmos complejos y eficientes. Por esta razón, contribuimos con Kornia, una libreria diseñada específicamente para trabajar con técnicas clásicas de visión por computadora conjuntamente con redes neuronales profundas. En esencia, creamos un marco que facilita el diseño de procesos complejos para algoritmos de visión por computadora para que puedan incluirse dentro de las redes neuronales y usarse para propagar gradientes dentro de un marco de optimización común. Finalmente, en el último capítulo de esta tesis desarrollamos el concepto antes mencionado de diseñar sistemas de forma conjunta con geometría proyectiva clásica. Por lo tanto, proponemos una solución al problema de la generación de vistas sintéticas mediante la alucinación de vistas novedosas de objetos altamente deformables utilizando un sistema conjunto con la geometría de la escena. En resumen, en esta tesis demostramos que con un diseño adecuado que combine los métodos clásicos de visión geométrica por computador con técnicas de aprendizaje profundo puede conducir a mejores soluciones para el problema de la reconstrucción de escenas.
From the early stages of Computer Vision, scene reconstruction has been one of the most studied topics leading to a wide variety of new discoveries and applications. Object grasping and manipulation, localization and mapping, or even visual effect generation are different examples of applications in which scene reconstruction has taken an important role for industries such as robotics, factory automation, or audio visual production. However, scene reconstruction is an extensive topic that can be approached in many different ways with already existing solutions that effectively work in controlled environments. Formally, the problem of scene reconstruction can be formulated as a sequence of independent processes which compose a pipeline. In this thesis, we analyse some parts of the reconstruction pipeline from which we contribute with novel methods using Convolutional Neural Networks (CNN) proposing innovative solutions that consider the optimisation of the methods in an end-to-end fashion. First, we review the state of the art of classical local features detectors and descriptors and contribute with two novel methods that inherently improve pre-existing solutions in the scene reconstruction pipeline. It is a fact that computer science and software engineering are two fields that usually go hand in hand and evolve according to mutual needs making easier the design of complex and efficient algorithms. For this reason, we contribute with Kornia, a library specifically designed to work with classical computer vision techniques along with deep neural networks. In essence, we created a framework that eases the design of complex pipelines for computer vision algorithms so that can be included within neural networks and be used to backpropagate gradients throw a common optimisation framework. Finally, in the last chapter of this thesis we develop the aforementioned concept of designing end-to-end systems with classical projective geometry. Thus, we contribute with a solution to the problem of synthetic view generation by hallucinating novel views from high deformable cloths objects using a geometry aware end-to-end system. To summarize, in this thesis we demonstrate that with a proper design that combine classical geometric computer vision methods with deep learning techniques can lead to improve pre-existing solutions for the problem of scene reconstruction.
21

Watiti, Tom Wanjala. „Vision-based virtual mouse system“. To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2009. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
22

Wakefield, Jonathan P. „A framework for generic computer vision“. Thesis, University of Huddersfield, 1994. http://eprints.hud.ac.uk/id/eprint/4003/.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
This thesis presents a highly flexible framework for generic computer vision. The framework is implemented as an essentially object-oriented blackboard system and can easily be modified for new application domains. This has been achieved by allowing application-specific knowledge representation and data representation to be defined in terms of generic system prototypes. Using the object-oriented programming/frames paradigm allows application-specific elements of the system to inherit interpretation strategies for finding objects, and methods for calculating measurements of their features. Furthermore, the compositional structure of objects and their inter-relationships can be represented. The system automatically generates control strategies for the current domain. Interpretation of an object consists of executing a number of interpretation strategies for that object, which may be interspersed amongst other interpretation tasks and thus termed dynamic interpretation strategies. Confidence ratings for object hypotheses, created by the interpretation strategies, are evaluated and combined consistently. The 'best' hypotheses are stored on the blackboard and used to guide subsequent processing. The division of an object's interpretation into stages facilitates the early posting of tentative hypotheses on the blackboard and the system concurrently considers alternative competing hypotheses. The developed system currently performs region-based image analysis, although the framework can be extended to incorporate edge-based and motion-based analysis. A uniform and consistent approach has been adopted to all objects, including object-parts, and all application specific knowledge is made explicit. New interpretation strategies can easily be incorporated. A review of related research and background theory is included. Results of example interpretation experiments, covering various applications, are provided for an implementation of the framework on both real and simulated images.
23

Lankton, Shawn M. „Localized statistical models in computer vision“. Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/31644.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Thesis (Ph.D)--Electrical and Computer Engineering, Georgia Institute of Technology, 2010.
Committee Chair: Tannenbaum, Allen; Committee Member: Al Regib, Ghassan; Committee Member: Niethammer, Marc; Committee Member: Shamma, Jeff; Committee Member: Stillman, Arthur; Committee Member: Yezzi, Anthony. Part of the SMARTech Electronic Thesis and Dissertation Collection.
24

Barngrover, Christopher M. „Computer vision techniques for underwater navigation“. Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/fullcit?p1477884.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Thesis (M.S.)--University of California, San Diego, 2010.
Title from first page of PDF file (viewed July 10, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (leaf 59).
25

Zandifar, Ali. „Computer vision for scene text analaysis“. College Park, Md. : University of Maryland, 2004. http://hdl.handle.net/1903/1767.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Thesis (Ph. D.) -- University of Maryland, College Park, 2004.
Thesis research directed by: Electrical Engineering. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
26

Johansson, Björn. „Multiscale Curvature Detection in Computer Vision“. Licentiate thesis, Linköping University, Linköping University, Computer Vision, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54966.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:

This thesis presents a new method for detection of complex curvatures such as corners, circles, and star patterns. The method is based on a second degree local polynomial model applied to a local orientation description in double angle representation. The theory of rotational symmetries is used to compute curvature responses from the parameters of the polynomial model. The responses are made more selective using a scheme of inhibition between different symmetry models. These symmetries can serve as feature points at a high abstraction level for use in hierarchical matching structures for 3D estimation, object recognition, image database search, etc.

A very efficient approximative algorithm for single and multiscale polynomial expansion is developed, which is used for detection of the complex curvatures in one or several scales. The algorithm is based on the simple observation that polynomial functions multiplied with a Gaussian function can be described in terms of partial derivatives of the Gaussian. The approximative polynomial expansion algorithm is evaluated in an experiment to estimate local orientation on 3D data, and the performance is comparable to previously tested algorithms which are more computationally expensive.

The curvature algorithm is demonstrated on natural images and in an object recognition experiment. Phase histograms based on the curvature features are developed and shown to be useful as an alternative compact image representation.

The importance of curvature is furthermore motivated by reviewing examples from biological and perceptual studies. The usefulness of local orientation information to detect curvature is also motivated by an experiment about learning a corner detector.

27

Bårman, Håkan. „Hierarchical curvature estimation in computer vision“. Doctoral thesis, Linköpings universitet, Bildbehandling, 1991. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54887.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
This thesis concerns the estimation and description of curvature for computer vision applications. Different types of multi-dimensional data are considered: images (2D); volumes (3D); time sequences of images (3D); and time sequences of volumes (4D). The methods are based on local Fourier domain models and use local operations such as filtering. A hierarchical approach is used. Firstly, the local orientation is estimated and represented with a vector field equivalent description. Secondly, the local curvature is estimated from the orientation description. The curvature algorithms are closely related to the orientation estimation algorithms and the methods as a whole give a unified approach to the estimation and description of orientation and curvature. In addition, the methodology avoids thresholding and premature decision making. Results on both synthetic and real world data are presented to illustrate the algorithms performance with respect to accuracy and noise insensitivity. Examples illustrating the use of the curvature estimates for tasks such as image enhancement are also included.
28

Safari-Foroushani, Ramin. „Form registration, a computer vision approach“. Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0012/NQ52413.pdf.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
29

Yuan, Dan. „Environmental exploration via computer vision techniques /“. Diss., Digital Dissertations Database. Restricted to UC campuses, 2007. http://uclibs.org/PID/11984.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
30

Phillips, Walter. „VHDL design of computer vision tasks“. Honors in the Major Thesis, University of Central Florida, 2001. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/240.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Engineering
Computer Science
31

Marshall, Christopher. „Robot trajectory generation using computer vision“. Thesis, University of Newcastle Upon Tyne, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.443107.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
32

Robles-Kelly, Antonio A. „Graph-spectral methods for computer vision“. Thesis, University of York, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.399252.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
33

Ali, Abdulamer T. „Computer vision aided road traffic analysis“. Thesis, University of Bristol, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.333953.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
34

Sofeikov, Konstantin Igorevich. „Measure concentration in computer vision applications“. Thesis, University of Leicester, 2018. http://hdl.handle.net/2381/42791.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
We live in the Information Age. In this age technological industry allows individuals to explore their personalized needs, therefore simplifying the procedure of making decisions. It also allows big global market players to leverage amounts of information they collect over time in order to excel in the markets they are operating in. Huge and often incomprehensive volumes of information collected to date constitute the phenomenon of Big Data. Big Data is a term used to describe datasets that are not suitable for processing by traditional software. To date, the commonly used way to get value out of Big Data is to employ a wide range of machine learning techniques. Machine learning is genuinely data-driven. The more data are available the better, from statistical point of view. This enables creation of an existing range of applications for broad spectrum of modeling and predictive tasks. Traditional methods of machine learning (e.g. linear models) are easy to implement and give computationally cheap solutions. These solutions, however, are not always capable to capture the underlaying complexity of Big Data. More sophisticated approaches (e.g. Convolution Neural Networks in computer vision) are show empirically to be reliable, but this reliability bears high computational costs. A natural way to overcome this obstacle appears to be reduction of Data Volume (the number of factors, attributes and records). Doing so, however, is an extremely tedious and non-trivial task itself. In this thesis we show that, thanks to well-known concentration of measure effect, it is often beneficial to keep the dimensionality of the problem high and use it to your own advantage. Measure concentration effect is a phenomenon that can only be found in high dimensional spaces. One of theoretical findings of this thesis is that using measure concentration effect allows one to correct individual mistakes of Artificial Intelligence (AI) systems in a cheap and non-intrusive way. Specifically we show how to correct AI systems errors with linear functional while not changing their inner decision making processes. As an illustration of how one can benefit from this we have developed Knowledge Transfer framework for legacy AI systems. The development of this framework is also an answer to a fundamental question: how a legacy "student" AI system could learn from "teacher" AI system without complete retraining. Theoretical findings are illustrated with several case studies in the area computer vision.
35

Viloria, John A. (John Alexander) 1978. „Optimizing clustering algorithms for computer vision“. Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/86847.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
36

Panish, Robert Martin. „Vehicle egomotion estimation using computer vision“. Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/46370.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2008.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 107-108).
A vision based navigation alter is developed for application on UAVs and tested in simulation. This alter is meant to allow the UAV to navigate in GPS-denied environments using measurements from a suite of cameras. The extended Kalman alter integrates measurements from multiple non-overlapping cameras as well as an IMU and occasional GPS. Simulations are conducted to evaluate the performance of the alter in a variety of fight regimes as well as to assess the value of using multiple cameras. Simulations demonstrate the value of using multiple cameras for egomotion estimation. Multiple non-overlapping cameras are useful for resolving motion in an unobservable direction that manifests as an ambiguity between translation and rotation. Additionally, multiple cameras are extremely useful when flying in an environment such as an urban canyon, where features remain in the fields of view for a very short period of time.
by Robert Martin Panish.
S.M.
37

Churcher, Stephen. „VLSI neural networks for computer vision“. Thesis, University of Edinburgh, 1993. http://hdl.handle.net/1842/13397.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Recent years have seen the rise to prominence of a powerful new computational paradigm - the so-called artificial neural network. Loosely based on the microstructure of the central nervous system, neural networks are massively parallel arrangements of simple processing elements (neurons) which communicate with each other through variable strength connections (synapses). The simplicity of such a description belies the complexity of calculations which neural networks are able to perform. Allied to this, the emergent properties of noise resistance, fault tolerance, and large data bandwidths (all arising from the parallel architecture) mean that neural networks, when appropriately implemented, represent a powerful tool for solving many problems which require the processing of real-world data. A computer vision task (viz. the classification of regions in images of segmented natural scenes) is presented, as a problem in which large numbers of data need to be processed quickly and accurately, whilst, in certain circumstances, being disambiguated. Of the classifiers tried, the neural network (a multi-layer perceptron) was found to provide the best overall solution, to the task of distinguishing between regions which were 'roads', and those which were 'not roads'. In order that best use might be made of the parallel processing abilities of neural networks, a variety of special purpose hardware implementations are discussed, before two different analogue VLSI designs are presented, complete with characterisation and test results. The latter of these chips (the EPSILON device) is used as the basis for a practical neuro-computing system. The results of experimentation with different applications are presented. Comparisons with computer simulations demonstrate the accuracy of the chips, and their ability to support learning algorithms, thereby proving the viability of the use of pulsed analogue VLSI techniques for the implementation of artificial neural networks.
38

Farajidavar, Nazli. „Transductive transfer learning for computer vision“. Thesis, University of Surrey, 2015. http://epubs.surrey.ac.uk/807998/.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Artificial intelligent and machine learning technologies have already achieved significant success in classification, regression and clustering. However, many machine learning methods work well only under a common assumption that training and test data are drawn from the same feature space and the same distribution. A real world applications is in sports footage, where an intelligent system has been designed and trained to detect score-changing events in a Tennis single match and we are interested to transfer this learning to either Tennis doubles game or even a more challenging domain such as Badminton. In such distribution changes, most statistical models need to be rebuilt, using newly collected training data. In many real world applications, it is expensive or even impossible to collect the required training data and rebuild the models. One of the ultimate goals of the open ended learning systems is to take advantage of previous experience/ knowledge in dealing with similar future problems. Two levels of learning can be identified in such scenarios. One draws on the data by capturing the pattern and regularities which enables reliable predictions on new samples. The other starts from an acquired source of knowledge and focuses on how to generalise it to a new target concept; this is also known as transfer learning which is going to be the main focus of this thesis. This work is devoted to a second level of learning by focusing on how to transfer information from previous learnings, exploiting it on a new learning problem with not supervisory information available for new target data. We propose several solutions to such tasks by leveraging over prior models or features. In the first part of the thesis we show how to estimate reliable transformations from the source domain to the target domain with the aim of reducing the dissimilarities between the source class-conditional distribution and a new unlabelled target distribution. We then later present a fully automated transfer learning framework which approaches the problem by combining four types of adaptation: a projection to lower dimensional space that is shared between the two domains, a set of local transformations to further increase the domain similarity, a classifier parameter adaptation method which modifies the learner for the new domain and a set of class-conditional transformations aiming to increase the similarity between the posterior probability of samples in the source and target sets. We conduct experiments on a wide range of image and video classification tasks. We test our proposed methods and show that, in all cases, leveraging knowledge from a related domain can improve performance when there are no labels available for direct training on the new target data.
39

Matuszewski, Damian Janusz. „Computer vision for continuous plankton monitoring“. Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-24042014-150825/.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Plankton microorganisms constitute the base of the marine food web and play a great role in global atmospheric carbon dioxide drawdown. Moreover, being very sensitive to any environmental changes they allow noticing (and potentially counteracting) them faster than with any other means. As such they not only influence the fishery industry but are also frequently used to analyze changes in exploited coastal areas and the influence of these interferences on local environment and climate. As a consequence, there is a strong need for highly efficient systems allowing long time and large volume observation of plankton communities. This would provide us with better understanding of plankton role on global climate as well as help maintain the fragile environmental equilibrium. The adopted sensors typically provide huge amounts of data that must be processed efficiently without the need for intensive manual work of specialists. A new system for general purpose particle analysis in large volumes is presented. It has been designed and optimized for the continuous plankton monitoring problem; however, it can be easily applied as a versatile moving fluids analysis tool or in any other application in which targets to be detected and identified move in a unidirectional flux. The proposed system is composed of three stages: data acquisition, targets detection and their identification. Dedicated optical hardware is used to record images of small particles immersed in the water flux. Targets detection is performed using a Visual Rhythm-based method which greatly accelerates the processing time and allows higher volume throughput. The proposed method detects, counts and measures organisms present in water flux passing in front of the camera. Moreover, the developed software allows saving cropped plankton images which not only greatly reduces required storage space but also constitutes the input for their automatic identification. In order to assure maximal performance (up to 720 MB/s) the algorithm was implemented using CUDA for GPGPU. The method was tested on a large dataset and compared with alternative frame-by-frame approach. The obtained plankton images were used to build a classifier that is applied to automatically identify organisms in plankton analysis experiments. For this purpose a dedicated feature extracting software was developed. Various subsets of the 55 shape characteristics were tested with different off-the-shelf learning models. The best accuracy of approximately 92% was obtained with Support Vector Machines. This result is comparable to the average expert manual identification performance. This work was developed under joint supervision with Professor Rubens Lopes (IO-USP).
Microorganismos planctônicos constituem a base da cadeia alimentar marinha e desempenham um grande papel na redução do dióxido de carbono na atmosfera. Além disso, são muito sensíveis a alterações ambientais e permitem perceber (e potencialmente neutralizar) as mesmas mais rapidamente do que em qualquer outro meio. Como tal, não só influenciam a indústria da pesca, mas também são frequentemente utilizados para analisar as mudanças nas zonas costeiras exploradas e a influência destas interferências no ambiente e clima locais. Como consequência, existe uma forte necessidade de desenvolver sistemas altamente eficientes, que permitam observar comunidades planctônicas em grandes escalas de tempo e volume. Isso nos fornece uma melhor compreensão do papel do plâncton no clima global, bem como ajuda a manter o equilíbrio do frágil meio ambiente. Os sensores utilizados normalmente fornecem grandes quantidades de dados que devem ser processados de forma eficiente sem a necessidade do trabalho manual intensivo de especialistas. Um novo sistema de monitoramento de plâncton em grandes volumes é apresentado. Foi desenvolvido e otimizado para o monitoramento contínuo de plâncton; no entanto, pode ser aplicado como uma ferramenta versátil para a análise de fluídos em movimento ou em qualquer aplicação que visa detectar e identificar movimento em fluxo unidirecional. O sistema proposto é composto de três estágios: aquisição de dados, detecção de alvos e suas identificações. O equipamento óptico é utilizado para gravar imagens de pequenas particulas imersas no fluxo de água. A detecção de alvos é realizada pelo método baseado no Ritmo Visual, que acelera significativamente o tempo de processamento e permite um maior fluxo de volume. O método proposto detecta, conta e mede organismos presentes na passagem do fluxo de água em frente ao sensor da câmera. Além disso, o software desenvolvido permite salvar imagens segmentadas de plâncton, que não só reduz consideravelmente o espaço de armazenamento necessário, mas também constitui a entrada para a sua identificação automática. Para garantir o desempenho máximo de até 720 MB/s, o algoritmo foi implementado utilizando CUDA para GPGPU. O método foi testado em um grande conjunto de dados e comparado com a abordagem alternativa de quadro-a-quadro. As imagens obtidas foram utilizadas para construir um classificador que é aplicado na identificação automática de organismos em experimentos de análise de plâncton. Por este motivo desenvolveu-se um software para extração de características. Diversos subconjuntos das 55 características foram testados através de modelos de aprendizagem disponíveis. A melhor exatidão de aproximadamente 92% foi obtida através da máquina de vetores de suporte. Este resultado é comparável à identificação manual média realizada por especialistas. Este trabalho foi desenvolvido sob a co-orientacao do Professor Rubens Lopes (IO-USP).
40

Zhan, Beibei. „Learning crowd dynamics using computer vision“. Thesis, Kingston University, 2008. http://eprints.kingston.ac.uk/20302/.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
An increase of violence in public spaces has prompted the introduction of more sophisticated technology to improve the safety and security of very crowded environments. Research disciplines such as civil engineering and sociology, have studied the crowd phenomenon for years, employing human visual observation to estimate the characteristics of a crowd. Computer vision researchers have increasingly been involved in the study and development of research methods for the automatic analysis of the crowd phenomenon. Until recently, most existing methods in computer vision have been focussed on extracting a limited number of features in controlled environments, with limited clutter and numbers of people. The main goal of this thesis is to advance the state of the art in computer vision methods for use in very crowded and cluttered environments. One of the aims is to devise a method that in the near future would be of help in other disciplines such as socio-dynamics and computer animation, where models of crowded scenes are built manually on painstaking visual observation. A series of novel methods is presented here that can learn crowd dynamics automatically by extracting different crowd information from real world crowded scenes and modelling crowd dynamics using computer vision. The developed methods include an individual behaviour classifier, a scene cluttering level estimator, two people counting schemes based on colour modelling and tracking, two algorithm for measuring crowd motion by matching local descriptors, and two dynamics modelling methods - one based on statistical techniques and the other one based on a neural network. The proposed information extracting methods are able to gather both macro information, which represents the properties of the whole crowd, and micro information, which is different from individual (location) to individual (location). The statistically-based dynamics modelling models the scene implicitly. Furthermore, a method for discovering the main path of the crowded scene is developed based on it. Self-Organizing Map (SOM) is chosen in the neural network approach of modelling dynamics; the resulting SOMs are proven to be able to capture the main dynamics of the crowded scene.
41

Rubio, Romano Antonio. „Fashion discovery : a computer vision approach“. Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Performing semantic interpretation of fashion images is undeniably one of the most challenging domains for computer vision. Subtle variations in color and shape might confer different meanings or interpretations to an image. Not only is it a domain tightly coupled with human understanding, but also with scene interpretation and context. Being able to extract fashion-specific information from images and interpret that information in a proper manner can be useful in many situations and help understanding the underlying information in an image. Fashion is also one of the most important businesses around the world, with an estimated value of 3 trillion dollars and a constantly growing online market, which increases the utility of image-based algorithms to search, classify or recommend garments. This doctoral thesis aims to solve specific problems related with the treatment of fashion e-commerce data, from low-level pure pixel information to high-level abstract conclusions of the garments appearing in an image, taking advantage of the multi-modality of the available data for developing some of the solutions. The contributions include: - A new superpixel extraction method focused on improving the annotation process for clothing images. - The construction of an image and text embedding for fashion data. - The application of this embedding space to the task of retrieving the main product in an image showing a complete outfit. In summary, fashion is a complex computer vision and machine learning problem at many levels, and developing specific algorithms that are able to capture essential information from pictures and text is not trivial. In order to solve some of the challenges it proposes, and taking into account that this is an Industrial Ph.D., we contribute with a variety of solutions that can boost the performance of many tasks useful for the fashion e-commerce industry.
La interpretación semántica de imágenes del mundo de la moda es sin duda uno de los dominios más desafiantes para la visión por computador. Leves variaciones en color y forma pueden conferir significados o interpretaciones distintas a una imagen. Es un dominio estrechamente ligado a la comprensión humana subjetiva, pero también a la interpretación y reconocimiento de escenarios y contextos. Ser capaz de extraer información específica sobre moda de imágenes e interpretarla de manera correcta puede ser útil en muchas situaciones y puede ayudar a entender la información subyacente en una imagen. Además, la moda es uno de los negocios más importantes a nivel global, con un valor estimado de tres trillones de dólares y un mercado online en constante crecimiento, lo cual aumenta el interés de los algoritmos basados en imágenes para buscar, clasificar o recomendar prendas. Esta tesis doctoral pretende resolver problemas específicos relacionados con el tratamiento de datos de tiendas virtuales de moda, yendo desde la información más básica a nivel de píxel hasta un entendimiento más abstracto que permita extraer conclusiones sobre las prendas presentes en una imagen, aprovechando para ello la Multi-modalidad de los datos disponibles para desarrollar algunas de las soluciones. Las contribuciones incluyen: - Un nuevo método de extracción de superpíxeles enfocado a mejorar el proceso de anotación de imágenes de moda. - La construcción de un espacio común para representar imágenes y textos referentes a moda. - La aplicación de ese espacio en la tarea de identificar el producto principal dentro de una imagen que muestra un conjunto de prendas. En resumen, la moda es un dominio complejo a muchos niveles en términos de visión por computador y aprendizaje automático, y desarrollar algoritmos específicos capaces de capturar la información esencial a partir de imágenes y textos no es una tarea trivial. Con el fin de resolver algunos de los desafíos que esta plantea, y considerando que este es un doctorado industrial, contribuimos al tema con una variedad de soluciones que pueden mejorar el rendimiento de muchas tareas extremadamente útiles para la industria de la moda online
Automàtica, robòtica i visió
42

Pizenberg, Matthieu. „Interactive computer vision through the Web“. Thesis, Toulouse, INPT, 2020. http://www.theses.fr/2020INPT0023.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
La vision par ordinateur est un domaine de l'informatique visant à reproduire et à améliorer la capacité de la vision humaine à comprendre son environnement. Dans cette thèse, nous nous concentrons sur deux domaines de la vision par ordinateur, à savoir la segmentation d'image et l'odométrie visuelle. Nous montrons l'impact positif qu'apporte l'usage d'applications Web interactives pour chacun d'eux. La première partie de cette thèse porte sur l'annotation et la segmentation d'images. Nous définissons dans un premier temps le problème de l'annotation d'images et les défis que cela représente pour des grands ensembles de données. De nombreuses interactions ont été utilisées dans la littérature pour aider les algorithmes de segmentation. Les plus courantes consistent à désigner explicitement des contours, dessiner des boîtes englobantes, ou marquer des traits à l'intérieur et à l'extérieur des objets d'intérêt. Dans un contexte de crowdsourcing, les tâches d'annotation sont déléguées à un public non-expert. Pour cette raison, nous avons mené une étude utilisateur montrant les avantages d'une interaction que nous appelons entourage par rapport aux autres types d'interactions. Nous décrivons comment le langage de programmation Elm nous a aidé à construire une application Web d'annotation d'images qui soit fiable. Un tour d'horizon des fonctionnalités et de son architecture est proposé, ainsi qu'un guide pour le déploiement dans des services de microtâches comme Amazon Mechanical Turk. Cette application est entièrement libre et mise à disposition en ligne. Dans la seconde partie de cette thèse, nous présentons notre bibliothèque libre d'odométrie visuelle directe. Nous fournissons une évaluation comparative montrant que notre approche est aussi performante que les alternatives actuellement disponibles. La formulation du problème d'odométrie visuelle repose sur des outils géométriques et des techniques d'optimisation nécessitant une grosse puissance de calcul pour fonctionner à 25 images par seconde. Puisque nous aspirons à exécuter ces algorithmes sur le Web, nous passons en revue les technologies passées et courantes fournissant des bonnes performances directement au sein du navigateur Web. En particulier, nous détaillons comment cibler une nouvelle plateforme appelée WebAssembly à partir des langages de programmation C++ et Rust. Notre bibliothèque a été implémentée entièrement dans le langage de programmation Rust, ce qui en a facilité le portage vers WebAssembly. Cette propriété nous a permis de mettre en place une application Web d'odométrie visuelle proposant différents types d'interactions. Une barre de temps permet une navigation unidimensionnelle le long de la séquence vidéo. Des paires de points peuvent être sélectionnées sur deux images de la séquence pour réaligner les caméras et corriger l'éventuelle dérive. Des couleurs sont également utilisées pour identifier des parties sélectionnables du nuage de points 3D pour réinitialiser les positions de la caméra. La combinaison de ces interactions permet d'apporter des améliorations sur les résultats du suivi et de la reconstruction du nuage de points 3D
Computer vision is the computational science aiming at reproducing and improving the ability of human vision to understand its environment. In this thesis, we focus on two fields of computer vision, namely image segmentation and visual odometry and we show the positive impact that interactive Web applications provide on each. The first part of this thesis focuses on image annotation and segmentation. We introduce the image annotation problem and challenges it brings for large, crowdsourced datasets. Many interactions have been explored in the literature to help segmentation algorithms. The most common consist in designating contours, bounding boxes around objects, or interior and exterior scribbles. When crowdsourcing, annotation tasks are delegated to a non-expert public, sometimes on cheaper devices such as tablets. In this context, we conducted a user study showing the advantages of the outlining interaction over scribbles and bounding boxes. Another challenge of crowdsourcing is the distribution medium. While evaluating an interaction in a small user study does not require complex setup, distributing an annotation campaign to thousands of potential users might differ. Thus we describe how the Elm programming language helped us build a reliable image annotation Web application. A highlights tour of its functionalities and architecture is provided, as well as a guide on how to deploy it to crowdsourcing services such as Amazon Mechanical Turk. The application is completely opensource and available online. In the second part of this thesis we present our open-source direct visual odometry library. In that endeavor, we provide an evaluation of other open-source RGB-D camera tracking algorithms and show that our approach performs as well as the currently available alternatives. The visual odometry problem relies on geometry tools and optimization techniques traditionally requiring much processing power to perform at realtime framerates. Since we aspire to run those algorithms directly in the browser, we review past and present technologies enabling high performance computations on the Web. In particular, we detail how to target a new standard called WebAssembly from the C++ and Rust programming languages. Our library has been started from scratch in the Rust programming language, which then allowed us to easily port it to WebAssembly. Thanks to this property, we are able to showcase a visual odometry Web application with multiple types of interactions available. A timeline enables one-dimensional navigation along the video sequence. Pairs of image points can be picked on two 2D thumbnails of the image sequence to realign cameras and correct drifts. Colors are also used to identify parts of the 3D point cloud, selectable to reinitialize camera positions. Combining those interactions enables improvements on the tracking and 3D point reconstruction results
43

Burns, James Ian. „Agricultural Crop Monitoring with Computer Vision“. Thesis, Virginia Tech, 2014. http://hdl.handle.net/10919/52563.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Precision agriculture allows farmers to efficiently use their resources with site-specific applications. The current work looks to computer vision for the data collection method necessary for such a smart field, including cameras sensitive to visual (430-650~nm), near infrared (NIR,750-900~nm), shortwave infrared (SWIR,950-1700~nm), and longwave infrared (LWIR,7500-16000~nm) light. Three areas are considered in the study: image segmentation, multispectral image registration, and the feature tracking of a stressed plant. The accuracy of several image segmentation methods are compared. Basic thresholding on pixel intensities and vegetation indices result in accuracies below 75% . Neural networks (NNs) and support vector machines (SVMs) label correctly at 89% and 79%, respectively, when given only visual information, and final accuracies of 97% when the near infrared is added. The point matching methods of Scale Invariant Feature Transform (SIFT) and Edge Orient Histogram (EOH) are compared for accuracy. EOH improves the matching accuracy, but ultimately not enough for the current work. In order to track the image features of a stressed plant, a set of basil and catmint seedlings are grown and placed under drought and hypoxia conditions. Trends are shown in the average pixel values over the lives of the plants and with the vegetation indices, especially that of Marchant and NIR. Lastly, trends are seen in the image textures of the plants through use of textons.
Master of Science
44

Millman, Michael Peter. „Computer vision for yarn quality inspection“. Thesis, Loughborough University, 2000. https://dspace.lboro.ac.uk/2134/34196.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Structural parameters that determine yarn quality include evenness, hairiness and twist. This thesis applies machine vision techniques to yarn inspection, to determine these parameters in a non-contact manner. Due to the increased costs of such a solution over conventional sensors, the thesis takes a wide look at, and where necessary develops, the potential uses of machine vision for several key aspects of yarn inspection at both low and high speed configurations. Initially, the optimum optical / imaging conditions for yarn imaging are determined by investigating the various factors which degrade a yarn image. The depth of field requirement for imaging yarns is analysed, and various solutions are discussed critically including apodisation, wave front encoding and mechanical guidance. A solution using glass plate guides is proposed, and tested in prototype. The plates enable the correct hair lengths to be seen in the image for long hairs, and also prevent damaging effects on the hairiness definition due to yarn vibration and yarn rotation. The optical system parameters and resolution limits of the yarn image when using guide plates are derived and optimised. The thesis then looks at methods of enhancing the yarn image, using various illumination methods, and incoherent and coherent dark-field imaging.
45

Pellegrini, Lorenzo <1993&gt. „Continual learning for computer vision applications“. Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amsdottorato.unibo.it/10401/1/Lorenzo%20Pellegrini%20-%20PhD%20Thesis.pdf.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
One of the most visionary goals of Artificial Intelligence is to create a system able to mimic and eventually surpass the intelligence observed in biological systems including, ambitiously, the one observed in humans. The main distinctive strength of humans is their ability to build a deep understanding of the world by learning continuously and drawing from their experiences. This ability, which is found in various degrees in all intelligent biological beings, allows them to adapt and properly react to changes by incrementally expanding and refining their knowledge. Arguably, achieving this ability is one of the main goals of Artificial Intelligence and a cornerstone towards the creation of intelligent artificial agents. Modern Deep Learning approaches allowed researchers and industries to achieve great advancements towards the resolution of many long-standing problems in areas like Computer Vision and Natural Language Processing. However, while this current age of renewed interest in AI allowed for the creation of extremely useful applications, a concerningly limited effort is being directed towards the design of systems able to learn continuously. The biggest problem that hinders an AI system from learning incrementally is the catastrophic forgetting phenomenon. This phenomenon, which was discovered in the 90s, naturally occurs in Deep Learning architectures where classic learning paradigms are applied when learning incrementally from a stream of experiences. This dissertation revolves around the Continual Learning field, a sub-field of Machine Learning research that has recently made a comeback following the renewed interest in Deep Learning approaches. This work will focus on a comprehensive view of continual learning by considering algorithmic, benchmarking, and applicative aspects of this field. This dissertation will also touch on community aspects such as the design and creation of research tools aimed at supporting Continual Learning research, and the theoretical and practical aspects concerning public competitions in this field.
46

Bristow, Hilton K. „Registration and representation in computer vision“. Thesis, Queensland University of Technology, 2016. https://eprints.qut.edu.au/99587/1/Hilton_Bristow_Thesis.pdf.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Given two animals of the same species, could you recognise common anatomical features between them, even if they appeared in different poses? This thesis studies the representation of photometric and geometric uncertainty in such fine-grained object recognition tasks. The problem is difficult, in part, because image appearance can vary wildly with even small changes in object pose. To constrain this inherently ill-posed problem, we develop methods for aligning novel images based on their semantic content, by efficiently leveraging priors over the statistics of natural images.
47

Raufdeen, Ramzi A. „SE4S toolkit extension project vision diagramming tool build your vision“. Thesis, California State University, Long Beach, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10147325.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:

Sustainability is an important topic when developing software because it helps develop ecofriendly programs. Software can contribute towards sustainability by supporting sustainable goals, which can be efficiently supported if considered early on in a project by requirements engineers. This project helps requirements engineers make that sustainable contribution through the development of the SE4S toolkit extension project–a vision diagramming tool that contributes towards sustainability. This interactive tool is developed using HTML, SVG, and JointJS library. The vision diagramming tool is an open source project that can be used in any browser, which allows requirements engineers to bring their visions to life while keeping sustainability in mind. Requirements engineers, with help from this tool, would be able to easily demonstrate their sustainability vision to their stakeholders and pass it on to rest of the development team.

48

TRUYENQUE, MICHEL ALAIN QUINTANA. „A COMPUTER VISION APPLICATION FOR HAND-GESTURES HUMAN COMPUTER INTERACTION“. PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2005. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=6585@1.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
A Visão Computacional pode ser utilizada para capturar gestos e criar dispositivos de interação com computadores mais intuitivos e rápidos. Os dispositivos comerciais atuais de interação baseados em gestos utilizam equipamentos caros (dispositivos de seguimento, luvas, câmeras especiais, etc.) e ambientes especiais que dificultam a difusão para o público em geral. Este trabalho apresenta um estudo sobre a viabilidade de utilizarmos câmeras Web como dispositivo de interação baseado em gestos da Mão. Em nosso estudo consideramos que a mão humana está limpa, isto é, sem nenhum dispositivo (mecânico, magnético ou óptico) colocado nela. Consideramos ainda que o ambiente onde ocorre a interação tem as características de um ambiente de trabalho normal, ou seja, sem luzes ou panos de fundo especiais. Para avaliar a viabilidade deste mecanismo de interação, desenvolvemos alguns protótipos. Neles os gestos da mão e as posições dos dedos são utilizados para simular algumas funções presentes em mouses e teclados, tais como selecionar estados e objetos e definir direções e posições. Com base nestes protótipos apresentamos algumas conclusões e sugestões para trabalhos futuros.
Computer Vision can be used to capture gestures and create more intuitive and faster devices to interact with computers. Current commercial gesture-based interaction devices make use of expensive equipment (tracking devices, gloves, special cameras, etc.) and special environments that make the dissemination of such devices to the general public difficult. This work presents a study on the feasibility of using Web cameras as interaction devices based on hand-gestures. In our study, we consider that the hand is clean, that is, it has no (mechanical, magnetic or optical) device. We also consider that the environment where the interaction takes place has the characteristics of a normal working place, that is, without special lights or backgrounds. In order to evaluate the feasibility of such interaction mechanism, we have developed some prototypes of interaction devices. In these prototypes, hand gestures and the position of fingers were used to simulate some mouse and keyboard functions, such as selecting states and objects, and defining directions and positions. Based on these prototypes, we present some conclusions and suggestions for future works.
49

Cheda, Diego. „Monocular Depth Cues in Computer Vision Applications“. Doctoral thesis, Universitat Autònoma de Barcelona, 2012. http://hdl.handle.net/10803/121644.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
La percepción de la profundidad es un aspecto clave en la visión humana. El ser humano realiza esta tarea sin esfuerzo alguno con el objetivo de efectuar diversas actividades cotidianas. A menudo, la percepción de la profundidad se ha asociado con la visión binocular. Pese a esto, los seres humanos tienen una capacidad asombrosa de percibir las relaciones de profundidad, incluso a partir de una sola imagen, mediante el uso de varias pistas monoculares. En el campo de la visión por ordenador, si la información de la profundidad de una imagen estuviera disponible, muchas tareas podr´ıan ser planteadas desde una perspectiva diferente en aras de un mayor rendimiento y robustez. Sin embargo, dada una única imagen, esta posibilidad es generalmente descartada, ya que la obtención de la información de profundidad es frecuentemente obtenida por las técnicas de reconstrucción tridimensional, que requieren dos o más imágenes de la misma escena tomadas desde diferentes puntos de vista. Recientemente, algunas propuestas han demostrado que es posible obtener información de profundidad a partir de imágenes individuales. En esencia, la idea es aprovechar el conocimiento a priori de las condiciones de adquisición de la imagen y de la escena observada para estimar la profundidad empleando pistas pictóricas monoculares. Estos enfoques tratan de estimar con precisión los mapas de profundidad de la escena empleando técnicas computacionalmente costosas. Sin embargo, muchos algoritmos de visión por ordenador no necesitan un mapa de profundidad detallado de la imagen. De hecho, sólo una descripción en profundidad aproximada puede ser muy valiosa en muchos problemas. En nuestro trabajo, hemos demostrado que incluso la información aproximada de profundidad puede integrarse en diferentes tareas siguiendo una estrategia holística con el fin de obtener resultados más precisos y robustos. En ese sentido, hemos propuesto una técnica simple, pero fiable, por medio de la cual regiones de la imagen de una escena se clasifican en rangos de profundidad discretos para construir un mapa tosco de la profundidad. Sobre la base de esta representación, hemos explorado la utilidad de nuestro método en tres dominios de aplicación desde puntos de vista novedosos: la estimación de la rotación de la cámara, la estimación del fondo de una escena y la generación de ventanas de interés para la detección de peatones. En el primer caso, calculamos la rotación de la cámara montada en un veh´ıculo en movimiento mediante dos nuevos m˜A c ⃝todos que identifican elementos distantes en la imagen a través de nuestros mapas de profundidad. En la reconstrucción del fondo de una imagen, propusimos un método novedoso que penaliza las regiones cercanas en una función de coste que integra, además, información del color y del movimiento. Por último, empleamos la información geométrica y de la profundidad de una escena para la generación de peatones candidatos. Este método reduce significativamente el número de ventanas generadas, las cuales serán posteriormente procesadas por un clasificador de peatones. En todos los casos, los resultados muestran que los enfoques basados en la profundidad contribuyen a un mejor rendimiento de las aplicaciones estudidadas.
Depth perception is a key aspect of human vision. It is a routine and essential visual task that the human do effortlessly in many daily activities. This has often been associated with stereo vision, but humans have an amazing ability to perceive depth relations even from a single image by using several monocular cues. In the computer vision field, if image depth information were available, many tasks could be posed from a different perspective for the sake of higher performance and robustness. Nevertheless, given a single image, this possibility is usually discarded, since obtaining depth information has frequently been performed by three-dimensional reconstruction techniques, requiring two or more images of the same scene taken from different viewpoints. Recently, some proposals have shown the feasibility of computing depth information from single images. In essence, the idea is to take advantage of a priori knowledge of the acquisition conditions and the observed scene to estimate depth from monocular pictorial cues. These approaches try to precisely estimate the scene depth maps by employing computationally demanding techniques. However, to assist many computer vision algorithms, it is not really necessary computing a costly and detailed depth map of the image. Indeed, just a rough depth description can be very valuable in many problems. In this thesis, we have demonstrated how coarse depth information can be integrated in different tasks following holistic and alternative strategies to obtain more precise and robustness results. In that sense, we have proposed a simple, but reliable enough technique, whereby image scene regions are categorized into discrete depth ranges to build a coarse depth map. Based on this representation, we have explored the potential usefulness of our method in three application domains from novel viewpoints: camera rotation parameters estimation, background estimation and pedestrian candidate generation. In the first case, we have computed camera rotation mounted in a moving vehicle from two novels methods that identify distant elements in the image, where the translation component of the image flow field is negligible. In background estimation, we have proposed a novel method to reconstruct the background by penalizing close regions in a cost function, which integrates color, motion, and depth terms. Finally, we have benefited of geometric and depth information available on single images for pedestrian candidate generation to significantly reduce the number of generated windows to be further processed by a pedestrian classifier. In all cases, results have shown that our depth-based approaches contribute to better performances.
50

Moe, Anders. „Passive Aircraft Altitude Estimation using Computer Vision“. Licentiate thesis, Linköping University, Linköping University, Computer Vision, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-53415.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:

This thesis presents a number of methods to estimate 3D structures with a single translating camera. The camera is assumed to be calibrated and to have a known translation and rotation.

Applications for aircraft altitude estimation and ground structure estimation ahead of the aircraft are discussed. The idea is to mount a camera on the aircraft and use the motion estimates obtained in the inertia navigation system. One reason for this arrangement is to make the aircraft more passive, in comparison to conventional radar based altitude estimation.

Two groups of methods are considered, optical flow based and region tracking based. Both groups have advantages and drawbacks.

Two methods to estimate the optical flow are presented. The accuracy of the estimated ground structure is increased by varying the temporal distance between the frames used in the optical flow estimation algorithms.

Four region tracking algorithms are presented. Two of them use canonical correlation and the other two are based on sum of squared difference and complex correlation respectively.

The depth estimates are then temporally filtered using weighted least squares or a Kalman filter.

A simple estimation of the computational complexity and memory requirements for the algorithms is presented to aid estimation of the hardware requirements.

Tests on real flight sequences are performed, showing that the aircraft altitude can be estimated with a good accuracy.

Zur Bibliographie