Rozprawy doktorskie: „Visual tracking”

1

Danelljan, Martin. "Visual Tracking". Thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-105659.

Pełny tekst źródła

Streszczenie:

Visual tracking is a classical computer vision problem with many important applications in areas such as robotics, surveillance and driver assistance. The task is to follow a target in an image sequence. The target can be any object of interest, for example a human, a car or a football. Humans perform accurate visual tracking with little effort, while it remains a difficult computer vision problem. It imposes major challenges, such as appearance changes, occlusions and background clutter. Visual tracking is thus an open research topic, but significant progress has been made in the last few years. The first part of this thesis explores generic tracking, where nothing is known about the target except for its initial location in the sequence. A specific family of generic trackers that exploit the FFT for faster tracking-by-detection is studied. Among these, the CSK tracker have recently shown obtain competitive performance at extraordinary low computational costs. Three contributions are made to this type of trackers. Firstly, a new method for learning the target appearance is proposed and shown to outperform the original method. Secondly, different color descriptors are investigated for the tracking purpose. Evaluations show that the best descriptor greatly improves the tracking performance. Thirdly, an adaptive dimensionality reduction technique is proposed, which adaptively chooses the most important feature combinations to use. This technique significantly reduces the computational cost of the tracking task. Extensive evaluations show that the proposed tracker outperform state-of-the-art methods in literature, while operating at several times higher frame rate. In the second part of this thesis, the proposed generic tracking method is applied to human tracking in surveillance applications. A causal framework is constructed, that automatically detects and tracks humans in the scene. The system fuses information from generic tracking and state-of-the-art object detection in a Bayesian filtering framework. In addition, the system incorporates the identification and tracking of specific human parts to achieve better robustness and performance. Tracking results are demonstrated on a real-world benchmark sequence.

Style APA, Harvard, Vancouver, ISO itp.

2

Wessler, Mike. "A modular visual tracking system". Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/11459.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

3

Klein, Georg. "Visual tracking for augmented reality". Thesis, University of Cambridge, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.614262.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

4

Salama, Gouda Ismail Mohamed. "Monocular and Binocular Visual Tracking". Diss., Virginia Tech, 1999. http://hdl.handle.net/10919/37179.

Pełny tekst źródła

Streszczenie:

Visual tracking is one of the most important applications of computer vision. Several tracking systems have been developed which either focus mainly on the tracking of targets moving on a plane, or attempt to reduce the 3-dimensional tracking problem to the tracking of a set of characteristic points of the target. These approaches are seriously handicapped in complex visual situations, particularly those involving significant perspective, textures, repeating patterns, or occlusion. This dissertation describes a new approach to visual tracking for monocular and binocular image sequences, and for both passive and active cameras. The method combines Kalman-type prediction with steepest-descent search for correspondences, using 2-dimensional affine mappings between images. This approach differs significantly from many recent tracking systems, which emphasize the recovery of 3-dimensional motion and/or structure of objects in the scene. We argue that 2-dimensional area-based matching is sufficient in many situations of interest, and we present experimental results with real image sequences to illustrate the efficacy of this approach. Image matching between two images is a simple one to one mapping, if there is no occlusion. In the presence of occlusion wrong matching is inevitable. Few approaches have been developed to address this issue. This dissertation considers the effect of occlusion on tracking a moving object for both monocular and binocular image sequences. The visual tracking system described here attempts to detect occlusion based on the residual error computed by the matching method. If the residual matching error exceeds a user-defined threshold, this means that the tracked object may be occluded by another object. When occlusion is detected, tracking continues with the predicted locations based on Kalman filtering. This serves as a predictor of the target position until it reemerges from the occlusion again. Although the method uses a constant image velocity Kalman filtering, it has been shown to function reasonably well in a non-constant velocity situation. Experimental results show that tracking can be maintained during periods of substantial occlusion. The area-based approach to image matching often involves correlation-based comparisons between images, and this requires the specification of a size for the correlation windows. Accordingly, a new approach based on moment invariants was developed to select window size adaptively. This approach is based on the sudden increasing or decreasing in the first Maitra moment invariant. We applied a robust regression model to smooth the first Maitra moment invariant to make the method robust against noise. This dissertation also considers the effect of spatial quantization on several moment invariants. Of particular interest are the affine moment invariants, which have emerged, in recent years as a useful tool for image reconstruction, image registration, and recognition of deformed objects. Traditional analysis assumes moments and moment invariants for images that are defined in the continuous domain. Quantization of the image plane is necessary, because otherwise the image cannot be processed digitally. Image acquisition by a digital system imposes spatial and intensity quantization that, in turn, introduce errors into moment and invariant computations. This dissertation also derives expressions for quantization-induced error in several important cases. Although it considers spatial quantization only, this represents an important extension of work by other researchers. A mathematical theory for a visual tracking approach of a moving object is presented in this dissertation. This approach can track a moving object in an image sequence where the camera is passive, and when the camera is actively controlled. The algorithm used here is computationally cheap and suitable for real-time implementation. We implemented the proposed method on an active vision system, and carried out experiments of monocular and binocular tracking for various kinds of objects in different environments. These experiments demonstrated that very good performance using real images for fairly complicated situations.
Ph. D.

Style APA, Harvard, Vancouver, ISO itp.

5

Dehlin, Carl. "Visual Tracking Using Stereo Images". Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-153776.

Pełny tekst źródła

Streszczenie:

Visual tracking concerns the problem of following an arbitrary object in a video sequence. In this thesis, we examine how to use stereo images to extend existing visual tracking algorithms, which methods exists to obtain information from stereo images, and how the results change as the parameters to each tracker vary. For this purpose, four abstract approaches are identified, with five distinct implementations. Each tracker implementation is an extension of a baseline algorithm, MOSSE. The free parameters of each model are optimized with respect to two different evaluation strategies called nor- and wir-tests, and four different objective functions, which are then fixed when comparing the models against each other. The results are created on single target tracks extracted from the KITTI tracking dataset, and the optimization results show that none of the objective functions are sensitive to the exposed parameters under the joint selection of model and dataset. The evaluation results also shows that none of the extensions improve the results of the baseline tracker.

Style APA, Harvard, Vancouver, ISO itp.

6

Salti, Samuele <1982&gt. "On-line adaptive visual tracking". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2011. http://amsdottorato.unibo.it/3735/1/samuele_salti_tesi.pdf.

Pełny tekst źródła

Streszczenie:

Visual tracking is the problem of estimating some variables related to a target given a video sequence depicting the target. Visual tracking is key to the automation of many tasks, such as visual surveillance, robot or vehicle autonomous navigation, automatic video indexing in multimedia databases. Despite many years of research, long term tracking in real world scenarios for generic targets is still unaccomplished. The main contribution of this thesis is the definition of effective algorithms that can foster a general solution to visual tracking by letting the tracker adapt to mutating working conditions. In particular, we propose to adapt two crucial components of visual trackers: the transition model and the appearance model. The less general but widespread case of tracking from a static camera is also considered and a novel change detection algorithm robust to sudden illumination changes is proposed. Based on this, a principled adaptive framework to model the interaction between Bayesian change detection and recursive Bayesian trackers is introduced. Finally, the problem of automatic tracker initialization is considered. In particular, a novel solution for categorization of 3D data is presented. The novel category recognition algorithm is based on a novel 3D descriptors that is shown to achieve state of the art performances in several applications of surface matching.

Style APA, Harvard, Vancouver, ISO itp.

7

Salti, Samuele <1982&gt. "On-line adaptive visual tracking". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2011. http://amsdottorato.unibo.it/3735/.

Pełny tekst źródła

Streszczenie:

Visual tracking is the problem of estimating some variables related to a target given a video sequence depicting the target. Visual tracking is key to the automation of many tasks, such as visual surveillance, robot or vehicle autonomous navigation, automatic video indexing in multimedia databases. Despite many years of research, long term tracking in real world scenarios for generic targets is still unaccomplished. The main contribution of this thesis is the definition of effective algorithms that can foster a general solution to visual tracking by letting the tracker adapt to mutating working conditions. In particular, we propose to adapt two crucial components of visual trackers: the transition model and the appearance model. The less general but widespread case of tracking from a static camera is also considered and a novel change detection algorithm robust to sudden illumination changes is proposed. Based on this, a principled adaptive framework to model the interaction between Bayesian change detection and recursive Bayesian trackers is introduced. Finally, the problem of automatic tracker initialization is considered. In particular, a novel solution for categorization of 3D data is presented. The novel category recognition algorithm is based on a novel 3D descriptors that is shown to achieve state of the art performances in several applications of surface matching.

Style APA, Harvard, Vancouver, ISO itp.

8

Delabarre, Bertrand. "Contributions to dense visual tracking and visual servoing using robust similarity criteria". Thesis, Rennes 1, 2014. http://www.theses.fr/2014REN1S124/document.

Pełny tekst źródła

Streszczenie:

Dans cette thèse, nous traitons les problèmes de suivi visuel et d'asservissement visuel, qui sont des thèmes essentiels dans le domaine de la vision par ordinateur. La plupart des techniques de suivi et d'asservissement visuel présentes dans la littérature se basent sur des primitives géométriques extraites dans les images pour estimer le mouvement présent dans la séquence. Un problème inhérent à ce type de méthode est le fait de devoir extraire et mettre en correspondance des primitives à chaque nouvelle image avant de pouvoir estimer un déplacement. Afin d'éviter cette couche algorithmique et de considérer plus d'information visuelle, de récentes approches ont proposé d'utiliser directement la totalité des informations fournies par l'image. Ces algorithmes, alors qualifiés de directs, se basent pour la plupart sur l'observation des intensités lumineuses de chaque pixel de l'image. Mais ceci a pour effet de limiter le domaine d'utilisation de ces approches, car ce critère de comparaison est très sensibles aux perturbations de la scène (telles que les variations de luminosité ou les occultations). Pour régler ces problèmes nous proposons de nous baser sur des travaux récents qui ont montré que des mesures de similarité comme la somme des variances conditionnelles ou l'information mutuelle permettaient d'accroître la robustesse des approches directes dans des conditions perturbées. Nous proposons alors plusieurs algorithmes de suivi et d'asservissement visuels directs qui utilisent ces fonctions de similarité afin d'estimer le mouvement présents dans des séquences d'images et de contrôler un robot grâce aux informations fournies par une caméra. Ces différentes méthodes sont alors validées et analysées dans différentes conditions qui viennent démontrer leur efficacité
In this document, we address the visual tracking and visual servoing problems. They are crucial thematics in the domain of computer and robot vision. Most of these techniques use geometrical primitives extracted from the images in order to estimate a motion from an image sequences. But using geometrical features means having to extract and match them at each new image before performing the tracking or servoing process. In order to get rid of this algorithmic step, recent approaches have proposed to use directly the information provided by the whole image instead of extracting geometrical primitives. Most of these algorithms, referred to as direct techniques, are based on the luminance values of every pixel in the image. But this strategy limits their use, since the criteria is very sensitive to scene perturbations such as luminosity shifts or occlusions. To overcome this problem, we propose in this document to use robust similarity measures, the sum of conditional variance and the mutual information, in order to perform robust direct visual tracking and visual servoing processes. Several algorithms are then proposed that are based on these criteria in order to be robust to scene perturbations. These different methods are tested and analyzed in several setups where perturbations occur which allows to demonstrate their efficiency

Style APA, Harvard, Vancouver, ISO itp.

9

Arslan, Ali Erkin. "Visual Tracking With Group Motion Approach". Master's thesis, METU, 2003. http://etd.lib.metu.edu.tr/upload/4/1056100/index.pdf.

Pełny tekst źródła

Streszczenie:

An algorithm for tracking single visual targets is developed in this study. Feature detection is the necessary and appropriate image processing technique for this algorithm. The main point of this approach is to use the data supplied by the feature detection as the observation from a group of targets having similar motion dynamics. Therefore a single visual target is regarded as a group of multiple targets. Accurate data association and state estimation under clutter are desired for this application similar to other multi-target tracking applications. The group tracking approach is used with the well-known probabilistic data association technique to cope with data association and estimation problems. The applicability of this method particularly for visual tracking and for other cases is also discussed.

Style APA, Harvard, Vancouver, ISO itp.

10

Zhu, Biwen. "Visual Tracking with Deep Learning : Automatic tracking of farm animals". Thesis, KTH, Radio Systems Laboratory (RS Lab), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-240086.

Pełny tekst źródła

Streszczenie:

Automatic tracking and video of surveillance on a farm could help to support farm management. In this project, an automated detection system is used to detect sows in surveillance videos. This system is based upon deep learning and computer vision methods. In order to minimize disk storage and to meet the network requirements necessary to achieve the real-performance, tracking in compressed video streams is essential. The proposed system uses a Discriminative Correlation Filter (DCF) as a classifier to detect targets. The tracking model is updated by training the classifier with online learning methods. Compression technology encodes the video data, thus reducing both the bit rates at which video signals are transmitted and helping the video transmission better adapt to the limited network bandwidth. However, compression may reduce the image quality of the videos the precision of our tracking may decrease. Hence, we conducted a performance evaluation of existing visual tracking algorithms on video sequences with quality degradation due to various compression parameters (encoders, target bitrate, rate control model, and Group of Pictures (GOP) size). The ultimate goal of video compression is to realize a tracking system with equal performance, but requiring fewer network resources. The proposed tracking algorithm successfully tracks each sow in consecutive frames in most cases. The performance of our tracker was benchmarked against two state-of-art tracking algorithms: Siamese Fully-Convolutional (FC) and Efficient Convolution Operators (ECO). The performance evaluation result shows our proposed tracker has similar performance to both Siamese FC and ECO. In comparison with the original tracker, the proposed tracker achieved similar tracking performance, while requiring much less storage and generating a lower bitrate when the video was compressed with appropriate parameters. However, the system is far slower than needed for real-time tracking due to high computational complexity; therefore, more optimal methods to update the tracking model will be needed to achieve real-time tracking.
Automatisk spårning av övervakning i gårdens område kan bidra till att stödja jordbruket management. I detta projekt till ett automatiserat system för upptäckt upptäcka suggor från övervaknings filmer kommer att utformas med djupa lärande och datorseende metoder. Av hänsyn till Diskhantering och tid och hastighet Krav över nätverket för att uppnå realtidsscenarier i framtiden är spårning i komprimerade videoströmmar är avgörande. Det föreslagna systemet i detta projekt skulle använda en DCF (diskriminerande korrelationsfilter) som en klassificerare att upptäcka mål. Spårningen modell kommer att uppdateras genom att utbilda klassificeraren med online inlärningsmetoder. Compression teknik kodar videodata och minskar bithastigheter där videosignaler sänds kan hjälpa videoöverföring anpassar bättre i begränsad nätverk. det kan dock reducera bildkvaliteten på videoklipp och leder exakt hastighet av vårt spårningssystem för att minska. Därför undersöker vi utvärderingen av prestanda av befintlig visuella spårningsalgoritmer på videosekvenser Det ultimata målet med videokomprimering är att bidra till att bygga ett spårningssystem med samma prestanda men kräver färre nätverksresurser. Den föreslagna spårning algoritm spår framgångsrikt varje sugga i konsekutiva ramar i de flesta fall prestanda vår tracker var jämföras med två state-of-art spårning algoritmer:. Siamese Fully-Convolutional (FC) och Efficient Convolution Operators (ECO) utvärdering av prestanda Resultatet visar vår föreslagna tracker blir liknande prestanda med Siamese FC och ECO. I jämförelse med den ursprungliga spårningen uppnådde den föreslagna spårningen liknande spårningseffektivitet, samtidigt som det krävde mycket mindre lagring och alstra en lägre bitrate när videon komprimerades med lämpliga parametrar. Systemet är mycket långsammare än det behövs för spårning i realtid på grund av hög beräkningskomplexitet; därför behövs mer optimala metoder för att uppdatera spårningsmodellen för att uppnå realtidsspårning.

Style APA, Harvard, Vancouver, ISO itp.

11

White, Jacob Harley. "Real-Time Visual Multi-Target Tracking in Realistic Tracking Environments". BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7486.

Pełny tekst źródła

Streszczenie:

This thesis focuses on visual multiple-target tracking (MTT) from a UAV. Typical state-of-the-art multiple-target trackers rely on an object detector as the primary detection source. However, object detectors usually require a GPU to process images in real-time, which may not be feasible to carry on-board a UAV. Additionally, they often do not produce consistent detections for small objects typical of UAV imagery.In our method, we instead detect motion to identify objects of interest in the scene. We detect motion at corners in the image using optical flow. We also track points long-term to continue tracking stopped objects. Since our motion detection algorithm generates multiple detections at each time-step, we use a hybrid probabilistic data association filter combined with a single iteration of expectation maximization to improve tracking accuracy.We also present a motion detection algorithm that accounts for parallax in non-planar UAV imagery. We use the essential matrix to distinguish between true object motion and apparent object motion due to parallax. Instead of calculating the essential matrix directly, which can be time-consuming, we design a new algorithm that optimizes the rotation and translation between frames. This new algorithm requires only 4 ms instead of 47 ms per frame of the video sequence.We demonstrate the performance of these algorithms on video data. These algorithms are shown to improve tracking accuracy, reliability, and speed. All these contributions are capable of running in real-time without a GPU.

Style APA, Harvard, Vancouver, ISO itp.

12

Niethammer, Marc. "Dynamic Level Sets for Visual Tracking". Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/7606.

Pełny tekst źródła

Streszczenie:

This thesis introduces geometric dynamic active contours in the context of visual tracking, augmenting geometric curve evolution with physically motivated dynamics. Adding additional state information to an evolving curve lifts the curve evolution problem to space dimensions larger than two and thus forbids the use of classical level set techniques. This thesis therefore develops and explores level set methods for problems of higher codimension, putting an emphasis on the vector distance function based approach. This formalism is very general, it is interesting in its own right and still a challenging topic. Two different implementations for geometric dynamic active contours are explored: the full level set approach as well as a simpler partial level set approach. The full level set approach results in full topological flexibility and can deal with curve intersections in the image plane. However, it is computationally expensive. On the other hand the partial level set approach gives up the topological flexibility (intersecting curves cannot be represented) for increased computational efficiency. Contours colliding with different dynamic information (e.g., objects crossing in the image plane) will be merged in the partial level set approach whereas they will correctly traverse each other in the full level set approach. Both implementations are illustrated on synthetic and real examples. Compared to the traditional static curve evolution case, fundamentally different evolution behaviors can be obtained by propagating additional information along with every point on a curve.

Style APA, Harvard, Vancouver, ISO itp.

13

Ndiour, Ibrahima Jacques. "Dynamic curve estimation for visual tracking". Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37283.

Pełny tekst źródła

Streszczenie:

This thesis tackles the visual tracking problem as a target contour estimation problem in the face of corrupted measurements. The major aim is to design robust recursive curve filters for accurate contour-based tracking. The state-space representation adopted comprises of a group component and a shape component describing the rigid motion and the non-rigid shape deformation respectively; filtering strategies on each component are then decoupled. The thesis considers two implicit curve descriptors, a classification probability field and the traditional signed distance function, and aims to develop an optimal probabilistic contour observer and locally optimal curve filters. For the former, introducing a novel probabilistic shape description simplifies the filtering problem on the infinite-dimensional space of closed curves to a series of point-wise filtering tasks. The definition and justification of a novel update model suited to the shape space, the derivation of the filtering equations and the relation to Kalman filtering are studied. In addition to the temporal consistency provided by the filtering, extensions involving distributed filtering methods are considered in order to maintain spatial consistency. For the latter, locally optimal closed curve filtering strategies involving curve velocities are explored. The introduction of a local, linear description for planar curve variation and curve uncertainty enables the derivation of a mechanism for estimating the optimal gain associated to the curve filtering process, given quantitative uncertainty levels. Experiments on synthetic and real sequences of images validate the filtering designs.

Style APA, Harvard, Vancouver, ISO itp.

14

Laberge, Dominic. "Visual tracking for human-computer interaction". Thesis, University of Ottawa (Canada), 2003. http://hdl.handle.net/10393/26504.

Pełny tekst źródła

Streszczenie:

The purpose of this master's thesis project is to design, implement and evaluate vision-based user interfaces for use in the context of virtual environments. Three interfaces are treated. The first one is a 4 degrees of freedom (DOF) mouse that can track the position and 1 DOF or rotation (roll) of a user's hand. The second one is a 6 DOF mouse that can track both the position and the orientation of a user's hand in 3D space. Finally the third one is a laser pointing interface used to track the laser spot of a standard laser pointer, in order to interact with a large screen display. The two latter interfaces use an auto-calibrated approach based on planar homography, that distinguish them from the standard computer-vision based approach which requires a previous step of calibration.

Style APA, Harvard, Vancouver, ISO itp.

15

Khan, Muhammad Haris. "Visual tracking over multiple temporal scales". Thesis, University of Nottingham, 2015. http://eprints.nottingham.ac.uk/33056/.

Pełny tekst źródła

Streszczenie:

Visual tracking is the task of repeatedly inferring the state (position, motion, etc.) of the desired target in an image sequence. It is an important scientific problem as humans can visually track targets in a broad range of settings. However, visual tracking algorithms struggle to robustly follow a target in unconstrained scenarios. Among the many challenges faced by visual trackers, two important ones are occlusions and abrupt motion variations. Occlusions take place when (an)other object(s) obscures the camera's view of the tracked target. A target may exhibit abrupt variations in apparent motion due to its own unexpected movement, camera movement, and low frame rate image acquisition. Each of these issues can cause a tracker to lose its target. This thesis introduces the idea of learning and propagation of tracking information over multiple temporal scales to overcome occlusions and abrupt motion variations. A temporal scale is a specific sequence of moments in time Models (describing appearance and/or motion of the target) can be learned from the target tracking history over multiple temporal scales and applied over multiple temporal scales in the future. With the rise of multiple motion model tracking frameworks, there is a need for a broad range of search methods and ways of selecting between the available motion models. The potential benefits of learning over multiple temporal scales are first assessed by studying both motion and appearance variations in the ground-truth data associated with several image sequences. A visual tracker operating over multiple temporal scales is then proposed that is capable of handling occlusions and abrupt motion variations. Experiments are performed to compare the performance of the tracker with competing methods, and to analyze the impact on performance of various elements of the proposed approach. Results reveal a simple, yet general framework for dealing with occlusions and abrupt motion variations. In refining the proposed framework, a search method is generalized for multiple competing hypotheses in visual tracking, and a new motion model selection criterion is proposed.

Style APA, Harvard, Vancouver, ISO itp.

16

Wong, Matthew. "Tracking maneuvering target using visual sensor". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ39896.pdf.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

17

Maggio, Emilio. "Monte Carlo methods for visual tracking". Thesis, Queen Mary, University of London, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.497791.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

18

Nott, Viswajith Karapoondi. "Joint Visual and Wireless Tracking System". UKnowledge, 2009. http://uknowledge.uky.edu/gradschool_theses/592.

Pełny tekst źródła

Streszczenie:

Object tracking is an important component in many applications including surveillance, manufacturing, inventory tracking, etc. The most common approach is to combine a surveillance camera with an appearance-based visual tracking algorithm. While this approach can provide high tracking accuracy, the tracker can easily diverge in environments where there are much occlusions. In recent years, wireless tracking systems based on different frequency ranges are becoming more popular. While systems using ultra-wideband frequencies suffer similar problems as visual systems, there are systems that use frequencies as low as in those in the AM band to circumvent the problems of obstacles, and exploit the near-field properties between the electric and magnetic waves to achieve tracking accuracy down to about one meter. In this dissertation, I study the combination of a visual tracker and a low-frequency wireless tracker to improve visual tracking in highly occluded area. The proposed system utilizes two homographies formed between the world coordinates with the image coordinates of the head and the foot of the target person. Using the world coordinate system, the proposed system combines a visual tracker and a wireless tracker in an Extended Kalman Filter framework for joint tracking. Extensive experiments have been conducted using both simulations and real videos to demonstrate the validity of our proposed scheme.

Style APA, Harvard, Vancouver, ISO itp.

19

Luo, Tao, i 羅濤. "Human visual tracking in surveillance video". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2014. http://hdl.handle.net/10722/206727.

Pełny tekst źródła

Streszczenie:

Visual surveillance in dynamic scenes, especially for human activities, is one of the current challenging research topics in computer vision. It is a key technology to fight against terrorism and crime to ensure public safety. The motivation of this thesis is to design an efficient human visual tracking system for video surveillance deployed in complex environments. In video surveillance, detection of moving objects is the first step to analyze the video streams. And motion segmentation is one of popular approaches to do it. In this thesis, we propose a motion segmentation method to overcome the problem of motion blurring. The task of human tracking is key to the effective use of more advanced technologies, like activity recognition and behavior understanding. However, human tracking routines often fail either due to human's arbitrary movements or occlusions by other objects. To overcome human's arbitrary movement, we propose a new Silhouette Chain Shift model for human detection and tracking. To track human under occlusions, firstly each frame is represented by a scene energy which consists of all the moving objects. Then the process of tracking is converted to a process of minimizing the proposed scene energy. Findings from the thesis contribute to improve the performance of human visual tracking system and therefore improve security in areas under surveillance.
published_or_final_version
Computer Science
Doctoral
Doctor of Philosophy

Style APA, Harvard, Vancouver, ISO itp.

20

North, Ben. "Learning dynamical models for visual tracking". Thesis, University of Oxford, 1998. http://ora.ox.ac.uk/objects/uuid:6ed12552-4c30-4d80-88ef-7245be2d8fb8.

Pełny tekst źródła

Streszczenie:

Using some form of dynamical model in a visual tracking system is a well-known method for increasing robustness and indeed performance in general. Often, quite simple models are used and can be effective, but prior knowledge of the likely motion of the tracking target can often be exploited by using a specially-tailored model. Specifying such a model by hand, while possible, is a time-consuming and error-prone process. Much more desirable is for an automated system to learn a model from training data. A dynamical model learnt in this manner can also be a source of useful information in its own right, and a set of dynamical models can provide discriminatory power for use in classification problems. Methods exist to perform such learning, but are limited in that they assume the availability of 'ground truth' data. In a visual tracking system, this is rarely the case. A learning system must work from visual data alone, and this thesis develops methods for learning dynamical models while explicitly taking account of the nature of the training data --- they are noisy measurements. The algorithms are developed within two tracking frameworks. The Kalman filter is a simple and fast approach, applicable where the visual clutter is limited. The recently-developed Condensation algorithm is capable of tracking in more demanding situations, and can also employ a wider range of dynamical models than the Kalman filter, for instance multi-mode models. The success of the learning algorithms is demonstrated experimentally. When using a Kalman filter, the dynamical models learnt using the algorithms presented here produce better tracking when compared with those learnt using current methods. Learning directly from training data gathered using Condensation is an entirely new technique, and experiments show that many aspects of a multi-mode system can be successfully identified using very little prior information. Significant computational effort is required by the implementation of the methods, and there is scope for improvement in this regard. Other possibilities for future work include investigation of the strong links this work has with learning problems in other areas. Most notable is the study of the 'graphical models' commonly used in expert systems, where the ideas presented here promise to give insight and perhaps lead to new techniques.

Style APA, Harvard, Vancouver, ISO itp.

21

Gladh, Susanna. "Visual Tracking Using Deep Motion Features". Thesis, Linköpings universitet, Datorseende, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-134342.

Pełny tekst źródła

Streszczenie:

Generic visual tracking is a challenging computer vision problem, where the position of a speciﬁed target is estimated through a sequence of frames. The only given information is the initial location of the target. Therefore, the tracker has to adapt and learn any kind of object, which it describes through visual features used to differentiate target from background. Standard appearance features only capture momentary visual information. This master’s thesis investigates the use of deep features extracted through optical ﬂow images processed in a deep convolutional network. The optical ﬂow is calculated using two consecutive images, and thereby captures the dynamic nature of the scene. Results show that this information is complementary to the standard appearance features, and improves performance of the tracker. Deep features are typically very high dimensional. Employing dimensionality reduction can increase both the efﬁciency and performance of the tracker. As a second aim in this thesis, PCA and PLS were evaluated and compared. The evaluations show that the two methods are almost equal in performance, with PLS actually receiving slightly better score than the popular PCA. The ﬁnal proposed tracker was evaluated on three challenging datasets, and was shown to outperform other state-of-the-art trackers.

Style APA, Harvard, Vancouver, ISO itp.

22

Thanikasalam, Kokul. "Appearance based online visual object tracking". Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/130875/1/Kokul_Thanikasalam_Thesis.pdf.

Pełny tekst źródła

Streszczenie:

This thesis presents research contributions to the field of computer vision based visual object tracking. This study investigates appearance based object tracking by using traditional hand-crafted and deep features. The thesis proposes a real-time tracking framework with high accuracy which follows a deep similarity tracking strategy. This thesis also proposes several deep tracking frameworks for high-accuracy tracking and to manage the spatial information loss. The research findings of the study would be able to be used in a range of applications including visual surveillance systems.

Style APA, Harvard, Vancouver, ISO itp.

23

DI, NARDO EMANUEL. "ADVANCED METHODOLOGIES FOR VISUAL OBJECT TRACKING". Doctoral thesis, Università degli Studi di Milano, 2022. http://hdl.handle.net/2434/931766.

Pełny tekst źródła

Streszczenie:

Visual Object Tracking is a continuously developing and competitive field in computer vision and machine learning. It is based on the idea of being able to follow any object, unknown up to that moment, in its movements without losing it or letting it run away. It is of fundamental importance to keep under observation a specific object otherwise difficult to track, in particular in very sensitive contexts such as security, more specifically in video surveillance, or autonomously guided or unmanned aerial vehicles (UAV), and in contexts in which it is of great help to perform automated tasks, such as in video production. As in any other field, deep learning methodologies became part of visual tracking, bringing with them the state-of-the-art of many methodologies including object detection, super-resolution, adversarial generative networks, and image classification. Most of these methodologies cooperate in a single architecture dedicated to tracking, to the point that the computational resources required for this purpose are very intensive, having to coexist and collaborate with multiple deep models. Visual object tracking has evolved more and more over the years from using correlation models based on transformations in the frequency domain, to allowing a global and highly efficient correlation, compatible, with learning based on gradient descent. It evolved again using models based on correlation of elements with similar characteristics extracted from a module that is able to recognize the characteristics of objects previously trained on the classification of images with large datasets. Moreover, we have recently witnessed the advent of transformers models with their explosion in the field of NLP and computer vision. These models were also rapidly included in methodologies for the field of visual tracking. The following thesis aims to inspect methodologies that can enrich the current state-of-the-art by building and exploring architectures not yet defined in the literature as well as trying to improve those already developed. We started our research with the idea of using Generative Adversarial Networks for their reconstruction power, putting the tracking problem in the point of view of subject segmentation. Going further into the topic, the work continued with the application of models based on Siamese networks that effectively allow to correlate the element to be tracked with the research space itself. However, these models have already been widely studied which has led us to develop even more complex networks. Although we remained in the context of Siamese networks, we started from the use of conventional convolutional networks to novel techniques in computer vision as the Transformers. These have proven to be the center of interest of a large part of the scientific community in every field and, therefore, we tried to apply them in a field that is new compared to the state-of-the art. As a result we obtained a method that is able to be compared with all current techniques, obtaining scores that allow the implemented methodology, not only to enter in high positions in the benchmark leaderboards of the various datasets, but also to participate in VOT2022, which is the challenge of reference for the world of visual tracking and which is the goal of every tracking algorithm. In addition, we investigated what the tracking algorithms are actually observing through methodologies of eXplainable Artificial Intelligence (XAI) and how the transformers and the attention mechanism play a very important role. During the research activity learning models that are different from those based on traditional crisp logic have also been developed. The aim was to try to find a common point that could combine fuzzy logic with the flexibility of deep learning and how this can be used to explain the relationships between the data as the complexity of the neural network increases. Fuzzy logic was applied to transformers to build a Fuzzy Transformer, where the attention component is more easily explained given the success of fuzzy models in the field of explainability. All the work of XAI has allowed us to verify how our idea of using the internal components of the mechanism of attention has produced a direct link between this and the elements on which we are actually going to act in order to produce the desired output. In addition, the experiments conducted on the fuzzy components, in this first phase, seems to validate our idea that not only can using a specific highly interpretable component produce results similar to the state of the art, but this also makes it easier to understand them.

Style APA, Harvard, Vancouver, ISO itp.

24

Olsson, Mica. "Visual composition in video games : Visual analyzation using eye-tracking". Thesis, Luleå tekniska universitet, Institutionen för konst, kommunikation och lärande, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-64489.

Pełny tekst źródła

Streszczenie:

This thesis studies how classical fine art theory and visual structure can be applied to predict the viewers attention and use eye-tracking as a tool to gather results. Also if eye-tracking is applicable for testing game environments and how the artist can affect the player’s choices and actions. First it will go through classic art theory and visual structure principles such as image composition, working with color, line and shapes and contrasts between them. Based on the theory an experiment is made using several pictures and predict where the viewer would look and find interesting and gave the viewer a game environment to interact with and see how it also can be applied in interactive media. All data during these experiments is captured using an eye-tracker system which will register the viewer’s eye movement and placement on the monitor screen. Based on the experiment there is still a lot more to research in using eye-tracking for game analysis. Eye-tracking for still images worked very well and the result data was easy to follow and read, but in interactive environment it becomes more abstract and needs to be evaluated more how to best utilize eye-tracking there.
Denna rapport undersöks hur man kan applicera teori från klassisk konst och visuell struktur till interaktiva spel-miljöer i realtid och kunna påverka en spelares val och handlingar. Först går den igenom teorin som används inom klassisk konst såsom komposition, arbeta med färger, linjer och former samt kontraster mellan dem. Baserat på teorin används olika bilder där teorin appliceras och förutspår vad tittaren ska kolla och finna intressant och vidare därifrån applicera samma metod men på interaktiv media. All data under dessa undersökningar kommer samlas in användandes av eye-tracking system vilket registrerar tittarens ögats rörelse och placering på en datorskärm. Baserat på resultaten finns det mycket mer att studera inom användning av eye-tracking för spelanalys. Eye-tracking för stillbilder fungerar väldigt bra med klar data att se och läsa, men för interaktiva miljöer blir det snabbt mer abstrakt och behöver utvärderas mer om hur Eye-tracking bäst kan användas där.

Style APA, Harvard, Vancouver, ISO itp.

25

Pålsson, Nicholas. "Guiding the viewer using visual components : Eye-tracking for visual analysis". Thesis, Luleå tekniska universitet, Institutionen för konst, kommunikation och lärande, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-74563.

Pełny tekst źródła

Streszczenie:

Are there ways to assess the objective value of the image? By first breaking down visual components and visual structure that are commonly used in image construction, this report will try to predict how a audience chooses to view an image. Through eye-tracking technology using a webcam to track the subjects' eye movement, these visual components validation will be tested. The result is presented as heatmaps; which illustrate the point of attention of the audience. The result is then compared with a hypothesis that was compiled in preparation for the examination. The result of the survey shows that potential off using eye-tracking for analysis, though the technology of using a web camera might not be the most suitable.

Style APA, Harvard, Vancouver, ISO itp.

26

Ali, Saad. "Taming Crowded Visual Scenes". Doctoral diss., University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3593.

Pełny tekst źródła

Streszczenie:

Computer vision algorithms have played a pivotal role in commercial video surveillance systems for a number of years. However, a common weakness among these systems is their inability to handle crowded scenes. In this thesis, we have developed algorithms that overcome some of the challenges encountered in videos of crowded environments such as sporting events, religious festivals, parades, concerts, train stations, airports, and malls. We adopt a top-down approach by first performing a global-level analysis that locates dynamically distinct crowd regions within the video. This knowledge is then employed in the detection of abnormal behaviors and tracking of individual targets within crowds. In addition, the thesis explores the utility of contextual information necessary for persistent tracking and re-acquisition of objects in crowded scenes. For the global-level analysis, a framework based on Lagrangian Particle Dynamics is proposed to segment the scene into dynamically distinct crowd regions or groupings. For this purpose, the spatial extent of the video is treated as a phase space of a time-dependent dynamical system in which transport from one region of the phase space to another is controlled by the optical flow. Next, a grid of particles is advected forward in time through the phase space using a numerical integration to generate a "flow map". The flow map relates the initial positions of particles to their final positions. The spatial gradients of the flow map are used to compute a Cauchy Green Deformation tensor that quantifies the amount by which the neighboring particles diverge over the length of the integration. The maximum eigenvalue of the tensor is used to construct a forward Finite Time Lyapunov Exponent (FTLE) field that reveals the Attracting Lagrangian Coherent Structures (LCS). The same process is repeated by advecting the particles backward in time to obtain a backward FTLE field that reveals the repelling LCS. The attracting and repelling LCS are the time dependent invariant manifolds of the phase space and correspond to the boundaries between dynamically distinct crowd flows. The forward and backward FTLE fields are combined to obtain one scalar field that is segmented using a watershed segmentation algorithm to obtain the labeling of distinct crowd-flow segments. Next, abnormal behaviors within the crowd are localized by detecting changes in the number of crowd-flow segments over time. Next, the global-level knowledge of the scene generated by the crowd-flow segmentation is used as an auxiliary source of information for tracking an individual target within a crowd. This is achieved by developing a scene structure-based force model. This force model captures the notion that an individual, when moving in a particular scene, is subjected to global and local forces that are functions of the layout of that scene and the locomotive behavior of other individuals in his or her vicinity. The key ingredients of the force model are three floor fields that are inspired by research in the field of evacuation dynamics; namely, Static Floor Field (SFF), Dynamic Floor Field (DFF), and Boundary Floor Field (BFF). These fields determine the probability of moving from one location to the next by converting the long-range forces into local forces. The SFF specifies regions of the scene that are attractive in nature, such as an exit location. The DFF, which is based on the idea of active walker models, corresponds to the virtual traces created by the movements of nearby individuals in the scene. The BFF specifies influences exhibited by the barriers within the scene, such as walls and no-entry areas. By combining influence from all three fields with the available appearance information, we are able to track individuals in high-density crowds. The results are reported on real-world sequences of marathons and railway stations that contain thousands of people. A comparative analysis with respect to an appearance-based mean shift tracker is also conducted by generating the ground truth. The result of this analysis demonstrates the benefit of using floor fields in crowded scenes. The occurrence of occlusion is very frequent in crowded scenes due to a high number of interacting objects. To overcome this challenge, we propose an algorithm that has been developed to augment a generic tracking algorithm to perform persistent tracking in crowded environments. The algorithm exploits the contextual knowledge, which is divided into two categories consisting of motion context (MC) and appearance context (AC). The MC is a collection of trajectories that are representative of the motion of the occluded or unobserved object. These trajectories belong to other moving individuals in a given environment. The MC is constructed using a clustering scheme based on the Lyapunov Characteristic Exponent (LCE), which measures the mean exponential rate of convergence or divergence of the nearby trajectories in a given state space. Next, the MC is used to predict the location of the occluded or unobserved object in a regression framework. It is important to note that the LCE is used for measuring divergence between a pair of particles while the FTLE field is obtained by computing the LCE for a grid of particles. The appearance context (AC) of a target object consists of its own appearance history and appearance information of the other objects that are occluded. The intent is to make the appearance descriptor of the target object more discriminative with respect to other unobserved objects, thereby reducing the possible confusion between the unobserved objects upon re-acquisition. This is achieved by learning the distribution of the intra-class variation of each occluded object using all of its previous observations. In addition, a distribution of inter-class variation for each target-unobservable object pair is constructed. Finally, the re-acquisition decision is made using both the MC and the AC.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD

Style APA, Harvard, Vancouver, ISO itp.

27

Larsson, Olof. "Visual-inertial tracking using Optical Flow measurements". Thesis, Linköping University, Automatic Control, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-59970.

Pełny tekst źródła

Streszczenie:

Visual-inertial tracking is a well known technique to track a combination of a camera and an inertial measurement unit (IMU). An issue with the straight-forward approach is the need of known 3D points. To by-pass this, 2D information can be used without recovering depth to estimate the position and orientation (pose) of the camera. This Master's thesis investigates the feasibility of using Optical Flow (OF) measurements and indicates the benifits using this approach.

The 2D information is added using OF measurements. OF describes the visual flow of interest points in the image plane. Without the necessity to estimate depth of these points, the computational complexity is reduced. With the increased 2D information, the 3D information required for the pose estimate decreases.

The usage of 2D points for the pose estimation has been verified with experimental data gathered by a real camera/IMU-system. Several data sequences containing different trajectories are used to estimate the pose. It is shown that OF measurements can be used to improve visual-inertial tracking with reduced need of 3D-point registrations.

Style APA, Harvard, Vancouver, ISO itp.

28

Ergezer, Hamza. "Visual Detection And Tracking Of Moving Objects". Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/2/12609098/index.pdf.

Pełny tekst źródła

Streszczenie:

In this study, primary steps of a visual surveillance system are presented: moving object detection and tracking of these moving objects. Background subtraction has been performed to detect the moving objects in the video, which has been taken from a static camera. Four methods, frame differencing, running (moving) average, eigenbackground subtraction and mixture of Gaussians, have been used in the background subtraction process. After background subtraction, using some additional operations, such as morphological operations and connected component analysis, the objects to be tracked have been acquired. While tracking the moving objects, active contour models (snakes) has been used as one of the approaches. In addition to this method
Kalman tracker and mean-shift tracker are other approaches which have been utilized. A new approach has been proposed for the problem of tracking multiple targets. We have implemented this method for single and multiple camera configurations. Multiple cameras have been used to augment the measurements. Homography matrix has been calculated to find the correspondence between cameras. Then, measurements and tracks have been associated by the new tracking method.

Style APA, Harvard, Vancouver, ISO itp.

29

Qin, Lei. "Online machine learning methods for visual tracking". Thesis, Troyes, 2014. http://www.theses.fr/2014TROY0017/document.

Pełny tekst źródła

Streszczenie:

Nous étudions le problème de suivi de cible dans une séquence vidéo sans aucune connaissance préalable autre qu'une référence annotée dans la première image. Pour résoudre ce problème, nous proposons une nouvelle méthode de suivi temps-réel se basant sur à la fois une représentation originale de l’objet à suivre (descripteur) et sur un algorithme adaptatif capable de suivre la cible même dans les conditions les plus difficiles comme le cas où la cible disparaît et réapparait dans le scène (ré-identification). Tout d'abord, pour la représentation d’une région de l’image à suivre dans le temps, nous proposons des améliorations au descripteur de covariance. Ce nouveau descripteur est capable d’extraire des caractéristiques spécifiques à la cible, tout en ayant la capacité à s’adapter aux variations de l’apparence de la cible. Ensuite, l’étape algorithmique consiste à mettre en cascade des modèles génératifs et des modèles discriminatoires afin d’exploiter conjointement leurs capacités à distinguer la cible des autres objets présents dans la scène. Les modèles génératifs sont déployés dans les premières couches afin d’éliminer les candidats les plus faciles alors que les modèles discriminatoires sont déployés dans les couches suivantes afin de distinguer la cibles des autres objets qui lui sont très similaires. L’analyse discriminante des moindres carrés partiels (AD-MCP) est employée pour la construction des modèles discriminatoires. Enfin, un nouvel algorithme d'apprentissage en ligne AD-MCP a été proposé pour la mise à jour incrémentale des modèles discriminatoires
We study the challenging problem of tracking an arbitrary object in video sequences with no prior knowledge other than a template annotated in the first frame. To tackle this problem, we build a robust tracking system consisting of the following components. First, for image region representation, we propose some improvements to the region covariance descriptor. Characteristics of a specific object are taken into consideration, before constructing the covariance descriptor. Second, for building the object appearance model, we propose to combine the merits of both generative models and discriminative models by organizing them in a detection cascade. Specifically, generative models are deployed in the early layers for eliminating most easy candidates whereas discriminative models are in the later layers for distinguishing the object from a few similar "distracters". The Partial Least Squares Discriminant Analysis (PLS-DA) is employed for building the discriminative object appearance models. Third, for updating the generative models, we propose a weakly-supervised model updating method, which is based on cluster analysis using the mean-shift gradient density estimation procedure. Fourth, a novel online PLS-DA learning algorithm is developed for incrementally updating the discriminative models. The final tracking system that integrates all these building blocks exhibits good robustness for most challenges in visual tracking. Comparing results conducted in challenging video sequences showed that the proposed tracking system performs favorably with respect to a number of state-of-the-art methods

Style APA, Harvard, Vancouver, ISO itp.

30

Häger, Gustav. "Improving Discriminative Correlation Filters for Visual Tracking". Thesis, Linköpings universitet, Datorseende, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-125963.

Pełny tekst źródła

Streszczenie:

Generic visual tracking is one of the classical problems in computer vision. In this problem, no prior knowledge of the target is available aside from a bounding box in the initial frame of the sequence. The generic visual tracking is a difficult task due to a number of factors such as momentary occlusions, target rotations, changes in target illumination and variations in the target size. In recent years, discriminative correlation filter (DCF) based trackers have shown promising results for visual tracking. These DCF based methods use the Fourier transform to efficiently calculate detection and model updates, allowing significantly higher frame rates than competing methods. However, existing DCF based methods only estimate translation of the object while ignoring changes in size.This thesis investigates the problem of accurately estimating the scale variations within a DCF based framework. A novel scale estimation method is proposed by explicitly constructing translation and scale filters. The proposed scale estimation technique is robust and significantly improve the tracking performance, while operating at real-time. In addition, a comprehensive evaluation of feature representations in a DCF framework is performed. Experiments are performed on the benchmark OTB-2015 dataset, as well as the VOT 2014 dataset. The proposed methods are shown to significantly improve the performance of existing DCF based trackers.
Allmän visuell följning är ett klassiskt problem inom datorseende. I den vanliga formuleringen antas ingen förkunskap om objektet som skall följas, utöver en initial rektangel i en videosekvens första bild.Detta är ett mycket svårt problem att lösa allmänt på grund av occlusioner, rotationer, belysningsförändringar och variationer i objektets uppfattde storlek. På senare år har följningsmetoder baserade på diskriminativea korrelationsfilter gett lovande resultat inom området. Dessa metoder är baserade på att med hjälp av Fourertransformen effektivt beräkna detektioner och modellupdateringar, samtidigt som de har mycket bra prestanda och klarar av många hundra bilder per sekund. De nuvarande metoderna uppskattar dock bara translationen hos det följda objektet, medans skalförändringar ignoreras. Detta examensarbete utvärderar ett antal metoder för att göra skaluppskattningar inom ett korrelationsfilterramverk. En innovativ metod baserad på att konstruera separata skal och translationsfilter. Den föreslagna metoden är robust och har signifikant bättre följningsprestanda, samtidigt som den kan användas i realtid. Det utförs också en utvärdering av olika särdragsrepresentationer på två stora benchmarking dataset för följning.

Style APA, Harvard, Vancouver, ISO itp.

31

Nassif, Samer Chaker. "Cooperative windowing for real-time visual tracking". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/NQ30107.pdf.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

32

Turker, Burcu. "Multiple hypothesis tracking for multiple visual targets". Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/3/12611837/index.pdf.

Pełny tekst źródła

Streszczenie:

Visual target tracking problem consists of two topics: Obtaining targets from camera measurements and target tracking. Even though it has been studied for more than 30 years, there are still some problems not completely solved. Especially in the case of multiple targets, association of measurements to targets, creation of new targets and deletion of old ones are among those. What is more, it is very important to deal with the occlusion and crossing targets problems suitably. We believe that a slightly modified version of multiple hypothesis tracking can successfully deal with most of the aforementioned problems with sufficient success. Distance, track size, track color, gate size and track history are used as parameters to evaluate the hypotheses generated for measurement to track association problem whereas size and color are used as parameters for occlusion problem. The overall tracker has been fine tuned over some scenarios and it has been observed that it performs well over the testing scenarios as well. Furthermore the performance of the tracker is analyzed according to those parameters in both association and occlusion handling situations.

Style APA, Harvard, Vancouver, ISO itp.

33

WESIERSKI, Daniel. "Visual tracking of articulated and flexible objects". Phd thesis, Institut National des Télécommunications, 2013. http://tel.archives-ouvertes.fr/tel-00939073.

Pełny tekst źródła

Streszczenie:

Humans can visually track objects mostly effortlessly. However, it is hard for a computer to track a fast moving object under varying illumination and occlusions, in clutter, and with varying appearance in camera projective space due to its relaxed rigidity or change in viewpoint. Since a generic, precise, robust, and fast tracker could trigger many applications, object tracking has been a fundamental problem of practical importance since the beginnings of computer vision. The first contribution of the thesis is a computationally efficient approach to tracking objects of various shapes and motions. It describes a unifying tracking system that can be configured to track the pose of a deformable object in a low or high-dimensional state-space. The object is decomposed into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints. The robustness and generality of the approach is widely demonstrated on tracking various flexible and articulated objects. Haar-like features are widely used in tracking. The second contribution of the thesis is a parser of ensembles of Haar-like features to compute them efficiently. The features are decomposed into simpler kernels, possibly shared by subsets of features, thus forming multi-pass convolutions. Discovering and aligning these kernels within and between passes allows forming recursive trees of kernels that require fewer memory operations than the classic computation, thereby producing the same result but more efficiently. The approach is validated experimentally on popular examples of Haar-like features

Style APA, Harvard, Vancouver, ISO itp.

34

Kaucic, Robert August. "Lip tracking for audio-visual speech recognition". Thesis, University of Oxford, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360392.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

35

Khoo, B. E. "A visual, knowledge-based robot tracking system". Thesis, Swansea University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.637791.

Pełny tekst źródła

Streszczenie:

The use of robots is important in many areas of industrial automation. For reliable and safe operation, accurate and precise control of their activities is vital. The current drawback is, however the lack of accurate of position sensors, that are not subject to load variations and wear. Hence there is a need to seek an accurate external sensor that is not subject to such problems. Such external position determination is also important for safety and fault recovery considerations. For example, the sharing of working environments between humans and robots implies that the robot must work inside a tight working envelope. Position sensing is an essential input for determining the actual working enveloped occupied by the robot. The position being sensed and, better still, the predicted position it is moving into, will enable early detection of possible abnormal situations. Also, if the robot does fail, knowledge of its precise position would be valuable in aiding rapid fault recovery. Approaches to position detection have usually suggested the use of markers attached to the robot. These, however, can cause some load problems, besides needing special, additional equipment. This thesis proposes an approach to recovering the position of a robot visually, using a single, stationary camera, without the need for any special markers on the robot. The system exploits knowledge of the robot being monitored by employing a model-based approach. The basic principle is to find the values of state variables of a robot model that described the appearance of the robot from the scene captured on the image. The state variables here are the joint angles of the robot being monitored. The proposed model-based tracking system consists of various components. The feature extraction determines the robot features from the captured image, and a matching module finds possible correspondences between the extracted features and the modelled robot features. The correspondences thus obtained are then used by a state estimation component to recover the states that describe the position of the joints. A tracking module then uses the position that has been recovered, and a robot motion model, to predict the future positions of the joints. The proposed system was tested statically and dynamically with a number of image sequences captured off-line. The accuracy achieved was 5 cm in static mode and within 10 pixels when dynamically tracking.

Style APA, Harvard, Vancouver, ISO itp.

36

Tosas, Martin. "Visual articulated hand tracking for interactive surfaces". Thesis, University of Nottingham, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.438416.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

37

Wang, Yiming. "Active visual tracking in multi-agent scenarios". Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/42804.

Pełny tekst źródła

Streszczenie:

Camera-equipped robots (agents) can autonomously follow people to provide continuous assistance in wide areas, e.g. museums and airports. Each agent serves one person (target) at a time and aims to maintain its target centred on the camera's image plane with a certain size (active visual tracking) without colliding with other agents and targets in its proximity. It is essential that each agent accurately estimates the state of itself and that of nearby targets and agents over time (i.e. tracking) to perform collision-free active visual tracking. Agents can track themselves with either on-board sensors (e.g. cameras or inertial sensors) or external tracking systems (e.g. multi-camera systems). However, on-board sensing alone is not sufficient for tracking nearby targets due to occlusions in crowded scenes, where an external multi-camera system can help. To address scalability of wide-area applications and accurate tracking, this thesis proposes a novel collaborative framework where agents track nearby targets jointly with wireless ceiling-mounted static cameras in a distributed manner. Distributed tracking enables each agent to achieve agreed state estimates of targets via iteratively communicating with neighbouring static cameras. However, such iterative neighbourhood communication may cause poor communication quality (i.e. packet loss/error) due to limited bandwidth, which worsens tracking accuracy. This thesis proposes the formation of coalitions among static cameras prior to distributed tracking based on a marginal information utility that accounts for both the communication quality and the local tracking confidence. Agents move on demand when hearing requests from nearby static cameras. Each agent independently selects its target with limited scene knowledge and computes its robotic control for collision-free active visual tracking. Collision avoidance among robots and targets can be achieved by the Optimal Reciprocal Collision Avoidance (ORCA) method. To further address view maintenance during collision avoidance manoeuvres, this thesis proposes an ORCA-based method with adaptive responsibility sharing and heading-aware robotic control mapping. Experimental results show that the proposed methods achieve higher tracking accuracy and better view maintenance compared with the state-of-the-art methods.

Style APA, Harvard, Vancouver, ISO itp.

38

Du, X. "Visual tracking in robotic minimally invasive surgery". Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10047149/.

Pełny tekst źródła

Streszczenie:

Intra-operative imaging and robotics are some of the technologies driving forward better and more effective minimally invasive surgical procedures. To advance surgical practice and capabilities further, one of the key requirements for computationally enhanced interventions is to know how instruments and tissues move during the operation. While endoscopic video captures motion, the complex appearance dynamic effects of surgical scenes are challenging for computer vision algorithms to handle with robustness. Tackling both tissue and instrument motion estimation, this thesis proposes a combined non-rigid surface deformation estimation method to track tissue surfaces robustly and in conditions with poor illumination. For instrument tracking, a keypoint based 2D tracker that relies on the Generalized Hough Transform is developed to initialize a 3D tracker in order to robustly track surgical instruments through long sequences that contain complex motions. To handle appearance changes and occlusion a patch-based adaptive weighting with segmentation and scale tracking framework is developed. It takes a tracking-by-detection approach and a segmentation model is used to assigns weights to template patches in order to suppress back- ground information. The performance of the method is thoroughly evaluated showing that without any offline-training, the tracker works well even in complex environments. Finally, the thesis proposes a novel 2D articulated instrument pose estimation framework, which includes detection-regression fully convolutional network and a multiple instrument parsing component. The framework achieves compelling performance and illustrates interesting properties includ- ing transfer between different instrument types and between ex vivo and in vivo data. In summary, the thesis advances the state-of-the art in visual tracking for surgical applications for both tissue and instrument motion estimation. It contributes to developing the technological capability of full surgical scene understanding from endoscopic video.

Style APA, Harvard, Vancouver, ISO itp.

39

Wesierski, Daniel. "Visual tracking of articulated and flexible objects". Thesis, Evry, Institut national des télécommunications, 2013. http://www.theses.fr/2013TELE0007/document.

Pełny tekst źródła

Streszczenie:

Les humains sont capables de suivre visuellement des objets sans effort. Cependant les algorithmes de vision artificielle rencontrent des limitations pour suivre des objets en mouvement rapide, sous un éclairage variable, en présence d'occultations, dans un environnement complexe ou dont l'apparence varie à cause de déformations et de changements de point de vue. Parce que des systèmes génériques, précis, robustes et rapides sont nécessaires pour de nombreuses d’applications, le suivi d’objets reste un problème pratique important en vision par ordinateur. La première contribution de cette thèse est une approche calculatoire rapide pour le suivi d'objets de forme et de mouvement variable. Elle consiste en un système unifié et configurable pour estimer l'attitude d’un objet déformable dans un espace d'états de dimension petite ou grande. L’objet est décomposé en une suite de segments composés de parties et organisés selon une hiérarchie spatio-temporelle contrainte. L'efficacité et l’universalité de cette approche sont démontrées expérimentalement sur de nombreux exemples de suivi de divers objets flexibles et articulés. Les caractéristiques de Haar (HLF) sont abondement utilisées pour le suivi d’objets. La deuxième contribution est une méthode de décomposition des HLF permettant de les calculer de manière efficace. Ces caractéristiques sont décomposées en noyaux plus simples, éventuellement réutilisables, et reformulées comme des convolutions multi-passes. La recherche et l'alignement des noyaux dans et entre les passes permet de créer des arbres récursifs de noyaux qui nécessitent moins d’opérations en mémoire que les systèmes de calcul classiques, pour un résultat de convolution identique et une mise en œuvre plus efficace. Cette approche a été validée expérimentalement sur des exemples de HLF très utilisés
Humans can visually track objects mostly effortlessly. However, it is hard for a computer to track a fast moving object under varying illumination and occlusions, in clutter, and with varying appearance in camera projective space due to its relaxed rigidity or change in viewpoint. Since a generic, precise, robust, and fast tracker could trigger many applications, object tracking has been a fundamental problem of practical importance since the beginnings of computer vision. The first contribution of the thesis is a computationally efficient approach to tracking objects of various shapes and motions. It describes a unifying tracking system that can be configured to track the pose of a deformable object in a low or high-dimensional state-space. The object is decomposed into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints. The robustness and generality of the approach is widely demonstrated on tracking various flexible and articulated objects. Haar-like features are widely used in tracking. The second contribution of the thesis is a parser of ensembles of Haar-like features to compute them efficiently. The features are decomposed into simpler kernels, possibly shared by subsets of features, thus forming multi-pass convolutions. Discovering and aligning these kernels within and between passes allows forming recursive trees of kernels that require fewer memory operations than the classic computation, thereby producing the same result but more efficiently. The approach is validated experimentally on popular examples of Haar-like features

Style APA, Harvard, Vancouver, ISO itp.

40

Woodley, Thomas Edward. "Visual tracking using offline and online learning". Thesis, University of Cambridge, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.608814.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

41

Loxam, James Ronald. "Robust filtering for real-time visual tracking". Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609503.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

42

Hoffmann, McElory Roberto. "Stochastic visual tracking with active appearance models". Thesis, Stellenbosch : University of Stellenbosch, 2009. http://hdl.handle.net/10019.1/1381.

Pełny tekst źródła

Streszczenie:

Thesis (PhD (Applied Mathematics))--University of Stellenbosch, 2009.
ENGLISH ABSTRACT: In many applications, an accurate, robust and fast tracker is needed, for example in surveillance, gesture recognition, tracking lips for lip-reading and creating an augmented reality by embedding a tracked object in a virtual environment. In this dissertation we investigate the viability of a tracker that combines the accuracy of active appearancemodels with the robustness of the particle lter (a stochastic process)—we call this combination the PFAAM. In order to obtain a fast system, we suggest local optimisation as well as using active appearance models tted with non-linear approaches. Active appearance models use both contour (shape) and greyscale information to build a deformable template of an object. ey are typically accurate, but not necessarily robust, when tracking contours. A particle lter is a generalisation of the Kalman lter. In a tutorial style, we show how the particle lter is derived as a numerical approximation for the general state estimation problem. e algorithms are tested for accuracy, robustness and speed on a PC, in an embedded environment and by tracking in ìD. e algorithms run real-time on a PC and near real-time in our embedded environment. In both cases, good accuracy and robustness is achieved, even if the tracked object moves fast against a cluttered background, and for uncomplicated occlusions.
AFRIKAANSE OPSOMMING: ’nAkkurate, robuuste en vinnige visuele-opspoorderword in vele toepassings benodig. Voorbeelde van toepassings is bewaking, gebaarherkenning, die volg van lippe vir liplees en die skep van ’n vergrote realiteit deur ’n voorwerp wat gevolg word, in ’n virtuele omgewing in te bed. In hierdie proefskrif ondersoek ons die lewensvatbaarheid van ’n visuele-opspoorder deur die akkuraatheid van aktiewe voorkomsmodellemet die robuustheid van die partikel lter (’n stochastiese proses) te kombineer—ons noem hierdie kombinasie die PFAAM. Ten einde ’n vinnige visuele-opspoorder te verkry, stel ons lokale optimering, sowel as die gebruik van aktiewe voorkomsmodelle wat met nie-lineêre tegnieke gepas is, voor. Aktiewe voorkomsmodelle gebruik kontoer (vorm) inligting tesamemet grysskaalinligting om ’n vervormbaremeester van ’n voorwerp te bou. Wanneer aktiewe voorkomsmodelle kontoere volg, is dit normaalweg akkuraat,maar nie noodwendig robuust nie. ’n Partikel lter is ’n veralgemening van die Kalman lter. Ons wys in tutoriaalstyl hoe die partikel lter as ’n numeriese benadering tot die toestand-beramingsprobleem afgelei kan word. Die algoritmes word vir akkuraatheid, robuustheid en spoed op ’n persoonlike rekenaar, ’n ingebedde omgewing en deur volging in ìD, getoets. Die algoritmes loop intyds op ’n persoonlike rekenaar en is naby intyds op ons ingebedde omgewing. In beide gevalle, word goeie akkuraatheid en robuustheid verkry, selfs as die voorwerp wat gevolg word, vinnig, teen ’n besige agtergrond beweeg of eenvoudige okklusies ondergaan.

Style APA, Harvard, Vancouver, ISO itp.

43

Lao, Yuanwei. "Visual Tracking by Exploiting Observations and Correlations". The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1269547716.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

44

Wettermark, Emma, i Linda Berglund. "Multi-Modal Visual Tracking Using Infrared Imagery". Thesis, Linköpings universitet, Datorseende, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176540.

Pełny tekst źródła

Streszczenie:

Generic visual object tracking is the task of tracking one or several objects in all frames in a video, knowing only the location and size of the target in the initial frame. Visual tracking can be carried out in both the infrared and the visual spectrum simultaneously, this is known as multi-modal tracking. Utilizing both spectra can result in a more diverse tracker since visual tracking in infrared imagery makes it possible to detect objects even in poor visibility or in complete darkness. However, infrared imagery lacks the number of details that are present in visual images. A common method for visual tracking is to use discriminative correlation filters (DCF). These correlation filters are then used to detect an object in every frame of an image sequence. This thesis focuses on investigating aspects of a DCF based tracker, operating in the two different modalities, infrared and visual imagery. First, it was investigated whether the tracking benefits from using two channels instead of one and what happens to the tracking result if one of those channels is degraded by an external cause. It was also investigated if the addition of image features can further improve the tracking. The result shows that the tracking improves when using two channels instead of only using a single channel. It also shows that utilizing two channels is a good way to create a robust tracker, which is still able to perform even though one of the channels is degraded. Using deep features, extracted from a pre-trained convolutional neural network, was the image feature improving the tracking the most, although the implementation of the deep features made the tracking significantly slower.

Style APA, Harvard, Vancouver, ISO itp.

45

Johnander, Joakim. "Visual Tracking with Deformable Continuous Convolution Operators". Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-138597.

Pełny tekst źródła

Streszczenie:

Visual Object Tracking is the computer vision problem of estimating a target trajectory in a video given only its initial state. A visual tracker often acts as a component in the intelligent vision systems seen in for instance surveillance, autonomous vehicles or robots, and unmanned aerial vehicles. Applications may require robust tracking performance on difﬁcult sequences depicting targets undergoing large changes in appearance, while enforcing a real-time constraint. Discriminative correlation ﬁlters have shown promising tracking performance in recent years, and consistently improved state-of-the-art. With the advent of deep learning, new robust deep features have improved tracking performance considerably. However, methods based on discriminative correlation ﬁlters learn a rigid template describing the target appearance. This implies an assumption of target rigidity which is not fulﬁlled in practice. This thesis introduces an approach which integrates deformability into a stateof-the-art tracker. The approach is thoroughly tested on three challenging visual tracking benchmarks, achieving state-of-the-art performance.

Style APA, Harvard, Vancouver, ISO itp.

46

Kilic, V. "Audio-visual tracking of multiple moving speakers". Thesis, University of Surrey, 2016. http://epubs.surrey.ac.uk/809761/.

Pełny tekst źródła

Streszczenie:

In this thesis, a novel approach is proposed for multi-speaker tracking by integrating audio and visual data in a particle filtering (PF) framework. This approach is further improved for adaptive estimation of two critical parameters of the PF, namely, the number of particles and noise variance, based on tracking error and the area occupied by the particles in the image. Here, it is assumed that the number of speakers is known and constant during the tracking. To relax this assumption, the random finite set (RFS) theory is used due to its ability in dealing with the problem of tracking a variable number of speakers. However, the computational complexity increases exponentially with the number of speakers, so probability hypothesis density (PHD) filter, which is first order approximation of the RFS, is applied with sequential Monte Carlo (SMC), namely particle filter, implementation since the computational complexity increases linearly with the number of speakers. The SMC-PHD filter in visual tracking uses three types of particles (i.e. surviving, spawned and born particles) to model the state of the speakers and to estimate the number of speakers. We propose to use audio data in the distribution of these particles to improve the visual SMC-PHD filter in terms of estimation accuracy and computational efficiency. The tracking accuracy of the proposed algorithm is further improved by using a modified mean-shift algorithm, and the extra computational complexity introduced by mean-shift is controlled with a sparse sampling technique. For quantitative evaluation, both audio and video sequences are required together with the calibration information of the cameras and microphone arrays (circular arrays). To this end, the AV16.3 dataset is used to demonstrate the performance of the proposed methods in a variety of scenarios such as occlusion and rapid movements of the speakers.

Style APA, Harvard, Vancouver, ISO itp.

47

Wu, Zheng. "Occlusion reasoning for multiple object visual tracking". Thesis, Boston University, 2013. https://hdl.handle.net/2144/12892.

Pełny tekst źródła

Streszczenie:

Thesis (Ph.D.)--Boston University
Occlusion reasoning for visual object tracking in uncontrolled environments is a challenging problem. It becomes significantly more difficult when dense groups of indistinguishable objects are present in the scene that cause frequent inter-object interactions and occlusions. We present several practical solutions that tackle the inter-object occlusions for video surveillance applications. In particular, this thesis proposes three methods. First, we propose "reconstruction-tracking," an online multi-camera spatial-temporal data association method for tracking large groups of objects imaged with low resolution. As a variant of the well-known Multiple-Hypothesis-Tracker, our approach localizes the positions of objects in 3D space with possibly occluded observations from multiple camera views and performs temporal data association in 3D. Second, we develop "track linking," a class of offline batch processing algorithms for long-term occlusions, where the decision has to be made based on the observations from the entire tracking sequence. We construct a graph representation to characterize occlusion events and propose an efficient graph-based/combinatorial algorithm to resolve occlusions. Third, we propose a novel Bayesian framework where detection and data association are combined into a single module and solved jointly. Almost all traditional tracking systems address the detection and data association tasks separately in sequential order. Such a design implies that the output of the detector has to be reliable in order to make the data association work. Our framework takes advantage of the often complementary nature of the two subproblems, which not only avoids the error propagation issue from which traditional "detection-tracking approaches" suffer but also eschews common heuristics such as "nonmaximum suppression" of hypotheses by modeling the likelihood of the entire image. The thesis describes a substantial number of experiments, involving challenging, notably distinct simulated and real data, including infrared and visible-light data sets recorded ourselves or taken from data sets publicly available. In these videos, the number of objects ranges from a dozen to a hundred per frame in both monocular and multiple views. The experiments demonstrate that our approaches achieve results comparable to those of state-of-the-art approaches.

Style APA, Harvard, Vancouver, ISO itp.

48

Firouzi, Hadi. "Visual non-rigid object tracking in dynamic environments". Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44629.

Pełny tekst źródła

Streszczenie:

This research presents machine vision techniques to track an object of interest visually in an image sequence in which the target appearance, scale, orientation, shape, and position may significantly change over time. The images are captured using a non-stationary camera in a dynamic environment in a gray-scale format, and the initial location of the target is given. The contributions of this thesis include the introduction of two robust object tracking techniques and an adaptive similarity measure which can significantly improve the performance of visual tracking. In the first technique, the target is initially partitioned into several sub-regions, and subsequently each sub-region is represented by two distinct adaptive templates namely immediate and delayed templates. At every tracking step, the translational transformation of each sub-region is preliminarily estimated using the immediate template by a multi-start gradient-based search, and then the delayed template is employed to correct the estimation. After this two-step optimization, the target is tracked by robust fusion of the new sub-region locations. From the experiments, the proposed tracker is more robust against appearance variance and occlusion in comparison with the traditional trackers. Similarly, in the second technique the target is represented by two heterogeneous Gaussian-based templates which models both short- and long-term changes in the target appearance. The target localization of the latter technique features an interactive multi-start optimization that takes into account generic transformations using a combination of sampling- and gradient-based algorithms in a probabilistic framework. Unlike the two-step optimization of the first method, the templates are used to find the best location of the target, simultaneously. This approach further increases both the efficiency and accuracy of the proposed tracker. Lastly, an adaptive metric to estimate the similarity between the target model and new images is proposed. In this work, a weighted L2-norm is used to calculate the target similarity measure. A histogram-based classifier is learned on-line to categorize the L2-norm error into three classes which subsequently specify a weight to each L2-norm error. The inclusion of the proposed similarity measure can remarkably improve the robustness of visual tracking against severe and long-term occlusion.

Style APA, Harvard, Vancouver, ISO itp.

49

Campos, TeoÌfilo EmiÌdio de. "3D visual tracking of articulated objects and hands". Thesis, University of Oxford, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.442396.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

50

Sudderth, Erik B. (Erik Blaine) 1977. "Graphical models for visual object recognition and tracking". Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/34023.

Pełny tekst źródła

Streszczenie:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 277-301).
We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications.
(cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples.
(cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories.
by Erik B. Sudderth.
Ph.D.

Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat „Visual tracking”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych