Dissertations / Theses: 'Machine vision; Object tracking'

1

Case, Isaac. "Automatic object detection and tracking in video /." Online version of thesis, 2010. http://hdl.handle.net/1850/12332.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Clarke, John Christopher. "Applications of sequence geometry to visual motion." Thesis, University of Oxford, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.244549.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Tydén, Amanda, and Sara Olsson. "Edge Machine Learning for Animal Detection, Classification, and Tracking." Thesis, Linköpings universitet, Reglerteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166572.

Full text

Abstract:

A research field currently advancing is the use of machine learning on camera trap data, yet few explore deep learning for camera traps to be run in real-time. A camera trap has the purpose to capture images of bypassing animals and is traditionally based only on motion detection. This work integrates machine learning on the edge device to also perform object detection. Related research is brought up and model tests are performed with a focus on the trade-off regarding inference speed and model accuracy. Transfer learning is used to utilize pre-trained models and thus reduce training time and the amount of training data. Four models with slightly different architecture are compared to evaluate which model performs best for the use case. The models tested are SSD MobileNet V2, SSD Inception V2, and SSDLite MobileNet V2, SSD MobileNet V2 quantized. Since the client-side usage of the model, the SSD MobileNet V2 was finally selected due to a satisfying trade-off between inference speed and accuracy. Even though it is less accurate in its detections, its ability to detect more images per second makes it outperform the more accurate Inception network in object tracking. A contribution of this work is a light-weight tracking solution using tubelet proposal. This work further discusses the open set recognition problem, where just a few object classes are of interest while many others are present. The subject of open set recognition influences data collection and evaluation tests, it is however left for further work to research how to integrate support for open set recognition in object detection models. The proposed system handles detection, classification, and tracking of animals in the African savannah, and has potential for real usage as it produces meaningful events

APA, Harvard, Vancouver, ISO, and other styles

4

Stigson, Magnus. "Object Tracking Using Tracking-Learning-Detection inThermal Infrared Video." Thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-93936.

Full text

Abstract:

Automatic tracking of an object of interest in a video sequence is a task that has been much researched. Difficulties include varying scale of the object, rotation and object appearance changing over time, thus leading to tracking failures. Different tracking methods, such as short-term tracking often fail if the object steps out of the camera’s field of view, or changes shape rapidly. Also, small inaccuracies in the tracking method can accumulate over time, which can lead to tracking drift. Long-term tracking is also problematic, partly due to updating and degradation of the object model, leading to incorrectly classified and tracked objects. This master’s thesis implements a long-term tracking framework called Tracking-Learning-Detection which can learn and adapt, using so called P/N-learning, to changing object appearance over time, thus making it more robust to tracking failures. The framework consists of three parts; a tracking module which follows the object from frame to frame, a learning module that learns new appearances of the object, and a detection module which can detect learned appearances of the object and correct the tracking module if necessary. This tracking framework is evaluated on thermal infrared videos and the results are compared to the results obtained from videos captured within the visible spectrum. Several important differences between visual and thermal infrared tracking are presented, and the effect these have on the tracking performance is evaluated. In conclusion, the results are analyzed to evaluate which differences matter the most and how they affect tracking, and a number of different ways to improve the tracking are proposed.

APA, Harvard, Vancouver, ISO, and other styles

5

Patrick, Ryan Stewart. "Surveillance in a Smart Home Environment." Wright State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=wright1278508516.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Moujtahid, Salma. "Exploiting scene context for on-line object tracking in unconstrained environments." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI110/document.

Full text

Abstract:

Avec le besoin grandissant pour des modèles d’analyse automatiques de vidéos, le suivi visuel d’objets est devenu une tache primordiale dans le domaine de la vision par ordinateur. Un algorithme de suivi dans un environnement non contraint fait face à de nombreuses difficultés: changements potentiels de la forme de l’objet, du fond, de la luminosité, du mouvement de la camera, et autres. Dans cette configuration, les méthodes classiques de soustraction de fond ne sont pas adaptées, on a besoin de méthodes de détection d’objet plus discriminantes. De plus, la nature de l’objet est a priori inconnue dans les méthodes de tracking génériques. Ainsi, les modèles d’apparence d’objets appris off-ligne ne peuvent être utilisés. L’évolution récente d’algorithmes d’apprentissage robustes a permis le développement de nouvelles méthodes de tracking qui apprennent l’apparence de l’objet de manière en ligne et s’adaptent aux variables contraintes en temps réel. Dans cette thèse, nous démarrons par l’observation que différents algorithmes de suivi ont différentes forces et faiblesses selon l’environnement et le contexte. Afin de surmonter les variables contraintes, nous démontrons que combiner plusieurs modalités et algorithmes peut améliorer considérablement la performance du suivi global dans les environnements non contraints. Plus concrètement, nous introduisant dans un premier temps un nouveau framework de sélection de trackers utilisant un critère de cohérence spatio-temporel. Dans ce framework, plusieurs trackers indépendants sont combinés de manière parallèle, chacun d’entre eux utilisant des features bas niveau basée sur différents aspects visuels complémentaires tel que la couleur, la texture. En sélectionnant de manière récurrente le tracker le plus adaptée à chaque trame, le système global peut switcher rapidement entre les différents tracker selon les changements dans la vidéo. Dans la seconde contribution de la thèse, le contexte de scène est utilisé dans le mécanisme de sélection de tracker. Nous avons conçu des features visuelles, extrait de l’image afin de caractériser les différentes conditions et variations de scène. Un classifieur (réseau de neurones) est appris grâce à ces features de scène dans le but de prédire à chaque instant le tracker qui performera le mieux sous les conditions de scènes données. Ce framework a été étendu et amélioré d’avantage en changeant les trackers individuels et optimisant l’apprentissage. Finalement, nous avons commencé à explorer une perspective intéressante où, au lieu d’utiliser des features conçu manuellement, nous avons utilisé un réseau de neurones convolutif dans le but d’apprendre automatiquement à extraire ces features de scène directement à partir de l’image d’entrée et prédire le tracker le plus adapté. Les méthodes proposées ont été évaluées sur plusieurs benchmarks publiques, et ont démontré que l’utilisation du contexte de scène améliore la performance globale du suivi d’objet
With the increasing need for automated video analysis, visual object tracking became an important task in computer vision. Object tracking is used in a wide range of applications such as surveillance, human-computer interaction, medical imaging or vehicle navigation. A tracking algorithm in unconstrained environments faces multiple challenges : potential changes in object shape and background, lighting, camera motion, and other adverse acquisition conditions. In this setting, classic methods of background subtraction are inadequate, and more discriminative methods of object detection are needed. Moreover, in generic tracking algorithms, the nature of the object is not known a priori. Thus, off-line learned appearance models for specific types of objects such as faces, or pedestrians can not be used. Further, the recent evolution of powerful machine learning techniques enabled the development of new tracking methods that learn the object appearance in an online manner and adapt to the varying constraints in real time, leading to very robust tracking algorithms that can operate in non-stationary environments to some extent. In this thesis, we start from the observation that different tracking algorithms have different strengths and weaknesses depending on the context. To overcome the varying challenges, we show that combining multiple modalities and tracking algorithms can considerably improve the overall tracking performance in unconstrained environments. More concretely, we first introduced a new tracker selection framework using a spatial and temporal coherence criterion. In this algorithm, multiple independent trackers are combined in a parallel manner, each of them using low-level features based on different complementary visual aspects like colour, texture and shape. By recurrently selecting the most suitable tracker, the overall system can switch rapidly between different tracking algorithms with specific appearance models depending on the changes in the video. In the second contribution, the scene context is introduced to the tracker selection. We designed effective visual features, extracted from the scene context to characterise the different image conditions and variations. At each point in time, a classifier is trained based on these features to predict the tracker that will perform best under the given scene conditions. We further improved this context-based framework and proposed an extended version, where the individual trackers are changed and the classifier training is optimised. Finally, we started exploring one interesting perspective that is the use of a Convolutional Neural Network to automatically learn to extract these scene features directly from the input image and predict the most suitable tracker

APA, Harvard, Vancouver, ISO, and other styles

7

Skjong, Espen, and Stian Aas Nundal. "Tracking objects with fixed-wing UAV using model predictive control and machine vision." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for teknisk kybernetikk, 2014. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-25990.

Full text

Abstract:

This thesis describes the development of an object tracking system for unmanned aerial vehicles (UAVs), intended to be used for search and rescue (SAR) missions. The UAV is equipped with a two-axis gimbal system, which houses an infrared (IR) camera used to detect and track objects of interest, and a lower level autopilot. An external computer vision (CV) module is assumed implemented and connected to the object tracking system, providing object positions and velocities to the control system. The realization of the object tracking system includes the design and assembly of the UAV’s payload, the design and implementation of a model predictive controller (MPC), embedded in a larger control environment, and the design and implementation of a human machine interface (HMI). The HMI allows remote control of the object tracking system from a ground control station. A toolkit for realizing optimal control problems (OCP), MPC and moving horizon estimators (MHE), called ACADO, is used. To gain real-time communication between all system modules, an asynchronous multi-threaded running environment, with interface to external HMIs, the CV module, the autopilot and external control systems, was implemented. In addition to the IR camera, a color still camera is mounted in the payload, intended for capturing high definition images of objects of interest and relaying the images to the operator on the ground. By using the center of the IR camera image projected down on earth, together with the UAV’s and the objects’ positions, the MPC is used to calculateway-points, path planning for the UAV, and gimbal attitude, which are used as control actions to the autopilot and the gimbal. Communication between the control system and the autopilot is handled by DUNE. If multiple objects are located and are to be tracked, the control system utilizes an object selection algorithm that determines which object to track depending on the distance between the UAV and each object. If multiple objects are clustered together, the object selection algorithm can choose to track all the clustered objects simultaneously. The object selection algorithm features dynamic object clustering, which is capable of tracking multiple moving objects. The system was tested in simulations, where suitable ACADO parameters were found through experimentation. Important requirements for the ACADO parameters are smooth gimbal control, an efficient UAV path and acceptable time consumption. The implemented HMI gives the operator access to live camera streams, the ability to alter system parameters and manually control the gimbal. The object tracking system was tested using hardware-in-loop (HIL) testing, and the results were encouraging. During the first flight of the UAV, without the payload on-board, the autopilot exhibited erroneous behavior and the UAV was grounded. A solution to the problem was not found in time to conduct any further flight tests during this thesis. A prototype for a three-axis stabilized brushless gimbal was designed and 3D printed. This was as a result of the two-axis gimbal system’s limited stabilizationcapabilities, small range of movement and seemingly fragile construction. Out of a suspected need for damping to improve image quality from the still camera, the process of designing and prototyping a wire vibration isolator camera mount was started. Further work and testing is required to realize both the gimbal and dampened camera mount. The lack of flight tests prohibited the completion of the object tracking system.Keywords: object tracking system, unmanned aerial vehicle (UAV), search and rescue,two-axis gimbal system, infrared (IR) camera, computer vision (CV), model predictivecontrol (MPC), control environment, human machine interface (HMI), remote control, ground control, ACADO, real-time, asynchronous multi-threaded running environment, way-point, path planning, DUNE, dynamic object clustering, multiple moving objects, hardware-in-loop (HIL), three-axis stabilized brushless gimbal, wire vibration isolator

APA, Harvard, Vancouver, ISO, and other styles

8

Adeboye, Taiyelolu. "Robot Goalkeeper : A robotic goalkeeper based on machine vision and motor control." Thesis, Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-27561.

Full text

Abstract:

This report shows a robust and efficient implementation of a speed-optimized algorithm for object recognition, 3D real world location and tracking in real time. It details a design that was focused on detecting and following objects in flight as applied to a football in motion. An overall goal of the design was to develop a system capable of recognizing an object and its present and near future location while also actuating a robotic arm in response to the motion of the ball in flight. The implementation made use of image processing functions in C++, NVIDIA Jetson TX1, Sterolabs’ ZED stereoscopic camera setup in connection to an embedded system controller for the robot arm. The image processing was done with a textured background and the 3D location coordinates were applied to the correction of a Kalman filter model that was used for estimating and predicting the ball location. A capture and processing speed of 59.4 frames per second was obtained with good accuracy in depth detection while the ball was well tracked in the tests carried out.

APA, Harvard, Vancouver, ISO, and other styles

9

Barkman, Richard Dan William. "Object Tracking Achieved by Implementing Predictive Methods with Static Object Detectors Trained on the Single Shot Detector Inception V2 Network." Thesis, Karlstads universitet, Fakulteten för hälsa, natur- och teknikvetenskap (from 2013), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-73313.

Full text

Abstract:

In this work, the possibility of realising object tracking by implementing predictive methods with static object detectors is explored. The static object detectors are obtained as models trained on a machine learning algorithm, or in other words, a deep neural network. Specifically, it is the single shot detector inception v2 network that will be used to train such models. Predictive methods will be incorporated to the end of improving the obtained models’ precision, i.e. their performance with respect to accuracy. Namely, Lagrangian mechanics will be employed to derived equations of motion for three different scenarios in which the object is to be tracked. These equations of motion will be implemented as predictive methods by discretising and combining them with four different iterative formulae. In ch. 1, the fundamentals of supervised machine learning, neural networks, convolutional neural networks as well as the workings of the single shot detector algorithm, approaches to hyperparameter optimisation and other relevant theory is established. This includes derivations of the relevant equations of motion and the iterative formulae with which they were implemented. In ch. 2, the experimental set-up that was utilised during data collection, and the manner by which the acquired data was used to produce training, validation and test datasets is described. This is followed by a description of how the approach of random search was used to train 64 models on 300×300 datasets, and 32 models on 512×512 datasets. Consecutively, these models are evaluated based on their performance with respect to camera-to-object distance and object velocity. In ch. 3, the trained models were verified to possess multi-scale detection capabilities, as is characteristic of models trained on the single shot detector network. While the former is found to be true irrespective of the resolution-setting of the dataset that the model has been trained on, it is found that the performance with respect to varying object velocity is significantly more consistent for the lower resolution models as they operate at a higher detection rate. Ch. 3 continues with that the implemented predictive methods are evaluated. This is done by comparing the resulting deviations when they are let to predict the missing data points from a collected detection pattern, with varying sampling percentages. It is found that the best predictive methods are those that make use of the least amount of previous data points. This followed from that the data upon which evaluations were made contained an unreasonable amount of noise, considering that the iterative formulae implemented do not take noise into account. Moreover, the lower resolution models were found to benefit more than those trained on the higher resolution datasets because of the higher detection frequency they can employ. In ch. 4, it is argued that the concept of combining predictive methods with static object detectors to the end of obtaining an object tracker is promising. Moreover, the models obtained on the single shot detector network are concluded to be good candidates for such applications. However, the predictive methods studied in this thesis should be replaced with some method that can account for noise, or be extended to be able to account for it. A profound finding is that the single shot detector inception v2 models trained on a low-resolution dataset were found to outperform those trained on a high-resolution dataset in certain regards due to the higher detection rate possible on lower resolution frames. Namely, in performance with respect to object velocity and in that predictive methods performed better on the low-resolution models.
I detta arbete undersöks möjligheten att åstadkomma objektefterföljning genom att implementera prediktiva metoder med statiska objektdetektorer. De statiska objektdetektorerna erhålls som modeller tränade på en maskininlärnings-algoritm, det vill säga djupa neurala nätverk. Specifikt så är det en modifierad version av entagningsdetektor-nätverket, så kallat entagningsdetektor inception v2 nätverket, som används för att träna modellerna. Prediktiva metoder inkorporeras sedan för att förbättra modellernas förmåga att kunna finna ett eftersökt objekt. Nämligen används Lagrangiansk mekanik för härleda rörelseekvationer för vissa scenarion i vilka objektet är tänkt att efterföljas. Rörelseekvationerna implementeras genom att låta diskretisera dem och därefter kombinera dem med fyra olika iterationsformler. I kap. 2 behandlas grundläggande teori för övervakad maskininlärning, neurala nätverk, faltande neurala nätverk men också de grundläggande principer för entagningsdetektor-nätverket, närmanden till hyperparameter-optimering och övrig relevant teori. Detta inkluderar härledningar av rörelseekvationerna och de iterationsformler som de skall kombineras med. I kap. 3 så redogörs för den experimentella uppställning som användes vid datainsamling samt hur denna data användes för att producera olika data set. Därefter följer en skildring av hur random search kunde användas för att träna 64 modeller på data av upplösning 300×300 och 32 modeller på data av upplösning 512×512. Vidare utvärderades modellerna med avseende på deras prestanda för varierande kamera-till-objekt avstånd och objekthastighet. I kap. 4 så verifieras det att modellerna har en förmåga att detektera på flera skalor, vilket är ett karaktäristiskt drag för modeller tränade på entagninsdetektor-nätverk. Medan detta gällde för de tränade modellerna oavsett vilken upplösning av data de blivit tränade på, så fanns detekteringsprestandan med avseende på objekthastighet vara betydligt mer konsekvent för modellerna som tränats på data av lägre upplösning. Detta resulterade av att dessa modeller kan arbeta med en högre detekteringsfrekvens. Kap. 4 fortsätter med att de prediktiva metoderna utvärderas, vilket de kunde göras genom att jämföra den resulterande avvikelsen de respektive metoderna innebar då de läts arbeta på ett samplat detektionsmönster, sparat från då en tränad modell körts. I och med denna utvärdering så testades modellerna för olika samplingsgrader. Det visade sig att de bästa iterationsformlerna var de som byggde på färre tidigare datapunkter. Anledningen för detta är att den insamlade data, som testerna utfördes på, innehöll en avsevärd mängd brus. Med tanke på att de implementerade iterationsformlerna inte tar hänsyn till brus, så fick detta avgörande konsekvenser. Det fanns även att alla prediktiva metoder förbättrade objektdetekteringsförmågan till en högre utsträckning för modellerna som var tränade på data av lägre upplösning, vilket följer från att de kan arbeta med en högre detekteringsfrekvens. I kap. 5, argumenteras det, bland annat, för att konceptet att kombinera prediktiva metoder med statiska objektdetektorer för att åstadkomma objektefterföljning är lovande. Det slutleds även att modeller som erhålls från entagningsdetektor-nätverket är lovande kandidater för detta applikationsområde, till följd av deras höga detekteringsfrekvenser och förmåga att kunna detektera på flera skalor. Metoderna som användes för att förutsäga det efterföljda föremålets position fanns vara odugliga på grund av deras oförmåga att kunna hantera brus. Det slutleddes därmed att dessa antingen bör utökas till att kunna hantera brus eller ersättas av lämpligare metoder. Den mest väsentliga slutsats detta arbete presenterar är att lågupplösta entagninsdetektormodeller utgör bättre kandidater än de tränade på data av högre upplösning till följd av den ökade detekteringsfrekvens de erbjuder.

APA, Harvard, Vancouver, ISO, and other styles

10

Ozertem, Kemal Arda. "Vision-assisted Object Tracking." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614073/index.pdf.

Full text

Abstract:

In this thesis, a video tracking method is proposed that is based on both computer vision and estimation theory. For this purpose, the overall study is partitioned into four related subproblems. The first part is moving object detection
for moving object detection, two different background modeling methods are developed. The second part is feature extraction and estimation of optical flow between video frames. As the feature extraction method, a well-known corner detector algorithm is employed and this extraction is applied only at the moving regions in the scene. For the feature points, the optical flow vectors are calculated by using an improved version of Kanade Lucas Tracker. The resulting optical flow field between consecutive frames is used directly in proposed tracking method. In the third part, a particle filter structure is build to provide tracking process. However, the particle filter is improved by adding optical flow data to the state equation as a correction term. In the last part of the study, the performance of the proposed approach is compared against standard implementations particle filter based trackers. Based on the simulation results in this study, it could be argued that insertion of vision-based optical flow estimation to tracking formulation improves the overall performance.

APA, Harvard, Vancouver, ISO, and other styles

11

Benfold, Ben. "The acquisition of coarse gaze estimates in visual surveillance." Thesis, University of Oxford, 2011. http://ora.ox.ac.uk/objects/uuid:59186519-9fee-4005-9570-0e3cf0384447.

Full text

Abstract:

This thesis describes the development of methods for automatically obtaining coarse gaze direction estimates for pedestrians in surveillance video. Gaze direction estimates are beneficial in the context of surveillance as an indicator of an individual's intentions and their interest in their surroundings and other people. The overall task is broken down into two problems. The first is that of tracking large numbers of pedestrians in low resolution video, which is required to identify the head regions within video frames. The second problem is to process the extracted head regions and estimate the direction in which the person is facing as a coarse estimate of their gaze direction. The first approach for head tracking combines image measurements from HOG head detections and KLT corner tracking using a Kalman filter, and can track the heads of many pedestrians simultaneously to output head regions with pixel-level accuracy. The second approach uses Markov-Chain Monte-Carlo Data Association (MCMCDA) within a temporal sliding window to provide similarly accurate head regions, but with improved speed and robustness. The improved system accurately tracks the heads of twenty pedestrians in 1920x1080 video in real-time and can track through total occlusions for short time periods. The approaches for gaze direction estimation all make use of randomised decision tree classifiers. The first develops classifiers for low resolution head images that are invariant to hair and skin colours using branch decisions based on abstract labels rather than direct image measurements. The second approach addresses higher resolution images using HOG descriptors and novel Colour Triplet Comparison (CTC) based branches. The final approach infers custom appearance models for individual scenes using weakly supervised learning over large datasets of approximately 500,000 images. A Conditional Random Field (CRF) models interactions between appearance information and walking directions to estimate gaze directions for head image sequences.

APA, Harvard, Vancouver, ISO, and other styles

12

Mozaffari, Maaref Mohammad Hamed. "A Real-Time and Automatic Ultrasound-Enhanced Multimodal Second Language Training System: A Deep Learning Approach." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/40477.

Full text

Abstract:

The critical role of language pronunciation in communicative competence is significant, especially for second language learners. Despite renewed awareness of the importance of articulation, it remains a challenge for instructors to handle the pronunciation needs of language learners. There are relatively scarce pedagogical tools for pronunciation teaching and learning, such as inefficient, traditional pronunciation instructions like listening and repeating. Recently, electronic visual feedback (EVF) systems (e.g., medical ultrasound imaging) have been exploited in new approaches in such a way that they could be effectively incorporated in a range of teaching and learning contexts. Evaluation of ultrasound-enhanced methods for pronunciation training, such as multimodal methods, has asserted that visualizing articulator’s system as biofeedback to language learners might improve the efficiency of articulation learning. Despite the recent successful usage of multimodal techniques for pronunciation training, manual works and human manipulation are inevitable in many stages of those systems. Furthermore, recognizing tongue shape in noisy and low-contrast ultrasound images is a challenging job, especially for non-expert users in real-time applications. On the other hand, our user study revealed that users could not perceive the placement of their tongue inside the mouth comfortably just by watching pre-recorded videos. Machine learning is a subset of Artificial Intelligence (AI), where machines can learn by experiencing and acquiring skills without human involvement. Inspired by the functionality of the human brain, deep artificial neural networks learn from large amounts of data to perform a task repeatedly. Deep learning-based methods in many computer vision tasks have emerged as the dominant paradigm in recent years. Deep learning methods are powerful in automatic learning of a new job, while unlike traditional image processing methods, they are capable of dealing with many challenges such as object occlusion, transformation variant, and background artifacts. In this dissertation, we implemented a guided language pronunciation training system, benefits from the strengths of deep learning techniques. Our modular system attempts to provide a fully automatic and real-time language pronunciation training tool using ultrasound-enhanced augmented reality. Qualitatively and quantitatively assessments indicate an exceptional performance for our system in terms of flexibility, generalization, robustness, and autonomy outperformed previous techniques. Using our ultrasound-enhanced system, a language learner can observe her/his tongue movements during real-time speech, superimposed on her/his face automatically.

APA, Harvard, Vancouver, ISO, and other styles

13

Law, Albert. "Experiments in object tracking in image sequences." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=100229.

Full text

Abstract:

This thesis explores three object tracking algorithms for image sequences. These algorithms include the ensemble tracker, the EM-like mean-shift colour-histogram tracker, and the wandering-stable-lost scale-invariant feature transform (WSL-SIFT) tracker. The algorithms are radically different from one another. Despite their differences, they are evaluated on the same publicly available, moderately sized, research data sets which include 129 test cases in 13 different scenes. The results aid in fostering an understanding of their respective behaviours and in highlighting their flaws and failures. Lastly, an implementation setup is described that is suited to large-scale, grid computing, batch testing of these algorithms. Results clearly indicate that none of the evaluated trackers are suited to general purpose use. However, one may intelligently choose a tracker for a well-defined application by analysing the known scene characteristics.

APA, Harvard, Vancouver, ISO, and other styles

14

Brown, Gary. "An object oriented model of machine vision." Thesis, Kingston University, 1997. http://eprints.kingston.ac.uk/20614/.

Full text

Abstract:

In this thesis an object oriented model is proposed that satisfies the requirements for a generic, customisable, reusable and flexible machine vision framework. These requirements are identified as being: ease of customisation for a particular application domain; independence from image definition; independence from shape representation scheme; ability to add new domain specific shape descriptors; independence from implemented machine vision algorithms; and the ability to maximise reuse of the generic framework. The thesis begins with a review of key machine vision functions and traditional architectures. In particular, machine vision architectures predicated on a process oriented framework are examined in detail and evaluated against the criteria stated above. An object oriented model is developed within the thesis, identifying the key classes underlying the machine vision domain. The responsibilities of these classes, and the relationships between them, are analysed in the context of high level machine vision tasks, for example object recognition. This object oriented approach is then contrasted with the more traditional process oriented approach. The object oriented model and framework is subsequently evaluated through a customisation, to illustrate an example machine vision application, namely Surface Mounted Electronic Assembly inspection. The object oriented model is also evaluated in the context of two functional machine vision applications described in literature. The model developed in this thesis incorporates the fundamental object oriented concepts of abstraction, encapsulation, inheritance and polymorphism. The results show that an object oriented approach does achieve the requirements for a generic, customisable, reusable and flexible machine vision framework.

APA, Harvard, Vancouver, ISO, and other styles

15

D'Souza, Collin. "Machine vision for shape and object recognition." Thesis, Nottingham Trent University, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.314332.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Calminder, Simon, and Chittum Mattew Källström. "Object Tracking andInterception System : Mobile Object Catching Robot using StaticStereo Vision." Thesis, KTH, Maskinkonstruktion (Inst.), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230249.

Full text

Abstract:

The aim of this project is to examine the feasibility andreliability of the use of a low cost computer vision system totrack and intercept a thrown object. A stereo vision systemtracks the object using color recognition and then guides amobile wheeled robot towards an interception point in orderto capture it. Two different trajectory prediction modelsare compared. One model fits a second degree polynomialto the collected positional measurements of the object andthe other uses the Forward Euler Method to construct theobjects flight path.To accurately guide the robot, the angular position of therobot must also be measured. Two different methods ofmeasuring the angular position are presented and their respectivereliability are measured. A calibrated magnetometeris used as one method while pure computer vision isimplemented as the alternative method.A functional object tracking and interception system thatwas able to intercept the thrown object was constructed usingboth the polynomial fitting trajectory prediction modelas well as the one based on the Forward Euler Method.The magnetometer and pure computer vision are both viablemethods of determining the angular position of therobot with an error of less than 1.5°.
I detta projekt behandlas konstruktionen av och pålitligheteni en bollfånganderobot och dess bakomliggande lågbudgetkamerasystem.För att fungera i tre dimensioner användsen stereokameramodul som spårar bollen med hjälpav färgigenkänning och beräknar bollbanan samt förutspårnedslaget för att ge god tid till roboten att genskjuta bollen.Två olika bollbanemodeller testas, där den ena tar hänsyntill luftmotståndet och nedslaget beräknas numeriskt ochden andra anpassar en andragradspolynom till de observeradedatapunkterna.För att styra roboten till den tänkta uppfångningspunktenbehövs både robotens position, vilket bestäms med kameramodulen,och robotens riktning. Riktningen bestäms medbåde en magnetometer och med kameramodulen, för attundersöka vilken metod som passar bäst.Den förslagna konstruktionen för roboten och kamerasystemetkan spåra och fånga objekt med bådadera de testademodellerna för att beräkna bollbana, dock så är tillförlitligheteni den numeriska metoden betydligt känsligare fördåliga mätvärden. Det är även möjligt att använda sig avbåde magnetometern eller endast kameramodulen för attbestämma robotens riktning då båda ger ett fel under 1.5°.

APA, Harvard, Vancouver, ISO, and other styles

17

Tsitiridis, Aristeidis. "Biologically-inspired machine vision." Thesis, Cranfield University, 2013. http://dspace.lib.cranfield.ac.uk/handle/1826/8029.

Full text

Abstract:

This thesis summarises research on the improved design, integration and expansion of past cortex-like computer vision models, following biologically-inspired methodologies. By adopting early theories and algorithms as a building block, particular interest has been shown for algorithmic parameterisation, feature extraction, invariance properties and classification. Overall, the major original contributions of this thesis have been: 1. The incorporation of a salient feature-based method for semantic feature extraction and refinement in object recognition. 2. The design and integration of colour features coupled with the existing morphological-based features for efficient and improved biologically-inspired object recognition. 3. The introduction of the illumination invariance property with colour constancy methods under a biologically-inspired framework. 4. The development and investigation of rotation invariance methods to improve robustness and compensate for the lack of such a mechanism in the original models. 5. Adaptive Gabor filter design that captures texture information, enhancing the morphological description of objects in a visual scene and improving the overall classification performance. 6. Instigation of pioneering research on Spiking Neural Network classification for biologically-inspired vision. Most of the above contributions have also been presented in two journal publications and five conference papers. The system has been fully developed and tested in computers using MATLAB under a variety of image datasets either created for the purposes of this work or obtained from the public domain.

APA, Harvard, Vancouver, ISO, and other styles

18

Thomas, Brigneti Andrés Attilio. "Multi-object tracking with camera." Tesis, Universidad de Chile, 2019. http://repositorio.uchile.cl/handle/2250/170746.

Full text

Abstract:

Memoria para optar al título de Ingeniero Civil Eléctrico
En este trabajo se evaluarán distintos algoritmos de trackeo para el problema de seguimiento de peatones, donde teniendo un video obtenido de una camara de seguridad, nos interesa reconocer correctamente cada individuo a traves del tiempo, buscando minimizar la cantindad de etiquetas mal asignadas y objetos (peatones) no identificados. Para esto se ocuparán algorimos basados en el concepto de Conjuntos Aleatorios Finitos (Random Finite Sets - RFS), los cuales usan mediciones pasadas de los objetos para predecir posiciones futuras de todos ellos simultaneamente, mientras que también se consideran los casos de nacimientos y muertes de los objetos. Estos algoritmos fueron concebidos para el trackeo de objetos con movimientos simples y predecibles en condiciones de una gran cantidad ruido en las mediciones. mientras que las condiciones en las que se evaluarán son drasticamente opuestas, con un nivel muy alto de certeza en las mediciones pero con movimientos altamente no linear y muy impredecible. Se ocupará una libreria abierta creada por el investigador Ba Tuong Vo, donde están implementados varios de los más clásicos algoritmos en esta área. Es por esto que el trabajo se basará más en el análisis de los resultados en estas nuevas condiciones y observar como se comparán a los algoritmos actuales del area de Computer Vision (CV)/ Machine Learning (ML), usando tanto métricas de RFS como del área de CV.

APA, Harvard, Vancouver, ISO, and other styles

19

Leavers, Violet. "Shape parametrisation and object recognition in machine vision." Thesis, King's College London (University of London), 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.243898.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Sun, Yaoru. "Hierarchical object-based visual attention for machine vision." Thesis, University of Edinburgh, 2003. http://hdl.handle.net/1842/316.

Full text

Abstract:

Human vision uses mechanisms of covert attention to selectively process interesting information and overt eye movements to extend this selectivity ability. Thus, visual tasks can be effectively dealt with by limited processing resources. Modelling visual attention for machine vision systems is not only critical but also challenging. In the machine vision literature there have been many conventional attention models developed but they are all space-based only and cannot perform object-based selection. In consequence, they fail to work in real-world visual environments due to the intrinsic limitations of the space-based attention theory upon which these models are built. The aim of the work presented in this thesis is to provide a novel human-like visual selection framework based on the object-based attention theory recently being developed in psychophysics. The proposed solution – a Hierarchical Object-based Attention Framework (HOAF) based on grouping competition, consists of two closely-coupled visual selection models of (1) hierarchical object-based visual (covert) attention and (2) object-based attention-driven (overt) saccadic eye movements. The Hierarchical Object-based Attention Model (HOAM) is the primary selection mechanism and the Object-based Attention-Driven Saccading model (OADS) has a supporting role, both of which are combined in the integrated visual selection framework HOAF. This thesis first describes the proposed object-based attention model HOAM which is the primary component of the selection framework HOAF. The model is based on recent psychophysical results on object-based visual attention and adopted grouping-based competition to integrate object-based and space-based attention together so as to achieve object-based hierarchical selectivity. The behaviour of the model is demonstrated on a number of synthetic images simulating psychophysical experiments and real-world natural scenes. The experimental results showed that the performance of our object-based attention model HOAM concurs with the main findings in the psychophysical literature on object-based and space-based visual attention. Moreover, HOAM has outstanding hierarchical selectivity from far to near and from coarse to fine by features, objects, spatial regions, and their groupings in complex natural scenes. This successful performance arises from three original mechanisms in the model: grouping-based saliency evaluation, integrated competition between groupings, and hierarchical selectivity. The model is the first implemented machine vision model of integrated object-based and space-based visual attention. The thesis then addresses another proposed model of Object-based Attention-Driven Saccadic eye movements (OADS) built upon the object-based attention model HOAM, ii as an overt saccading component within the object-based selection framework HOAF. This model, like our object-based attention model HOAM, is also the first implemented machine vision saccading model which makes a clear distinction between (covert) visual attention and overt saccading movements in a two-level selection system – an important feature of human vision but not yet explored in conventional machine vision saccading systems. In the saccading model OADS, a log-polar retina-like sensor is employed to simulate the human-like foveation imaging for space variant sensing. Through a novel mechanism for attention-driven orienting, the sensor fixates on new destinations determined by object-based attention. Hence it helps attention to selectively process interesting objects located at the periphery of the whole field of view to accomplish the large-scale visual selection tasks. By another proposed novel mechanism for temporary inhibition of return, OADS can simulate the human saccading/ attention behaviour to refixate/reattend interesting objects for further detailed inspection. This thesis concludes that the proposed human-like visual selection solution – HOAF, which is inspired by psychophysical object-based attention theory and grouping-based competition, is particularly useful for machine vision. HOAF is a general and effective visual selection framework integrating object-based attention and attentiondriven saccadic eye movements with biological plausibility and object-based hierarchical selectivity from coarse to fine in a space-time context.

APA, Harvard, Vancouver, ISO, and other styles

21

Sun, Shijun. "Video object segmentation and tracking using VSnakes /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/6038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Berry, David T. "A knowledge-based framework for machine vision." Thesis, Heriot-Watt University, 1987. http://hdl.handle.net/10399/1022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

CALMINDER, SIMON, and CHITTUM MATTHEW KÄLLSTRÖM. "Object Tracking and Interception System : Mobile Object Catching Robot using Static Stereo Vision." Thesis, KTH, Mekatronik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233135.

Full text

Abstract:

The aim of this project is to examine the feasibility and reliability of the use of a low cost computer vision system to track and intercept a thrown object. A stereo vision system tracks the object using color recognition and then guides a mobile wheeled robot towards an interception point in order to capture it. Two different trajectory prediction models are compared. One model fits a second degree polynomial to the collected positional measurements of the object and the other uses the Forward Euler Method to construct the objects flight path. To accurately guide the robot, the angular position of the robot must also be measured. Two different methods of measuring the angular position are presented and their respective reliability are measured. A calibrated magnetometer is used as one method while pure computer vision is implemented as the alternative method. A functional object tracking and interception system that was able to intercept the thrown object was constructed using both the polynomial fitting trajectory prediction model as well as the one based on the Forward Euler Method. The magnetometer and pure computer vision are both viable methods of determining the angular position of the robot with an error of less than 1.5°.
I detta projekt behandlas konstruktionen av och pålitligheten i en bollfånganderobot och dess bakomliggande lågbudgetkamerasystem. För att fungera i tre dimensioner används en stereokameramodul som spårar bollen med hjälp av färgigenkänning och beräknar bollbanan samt förutspår nedslaget för att ge god tid till roboten att genskjuta bollen. Två olika bollbanemodeller testas, där den ena tar hänsyn till luftmotståndet och nedslaget beräknas numeriskt och den andra anpassar en andragradspolynom till de observerade datapunkterna. För att styra roboten till den tänkta uppfångningspunkten behövs både robotens position, vilket bestäms med kameramodulen, och robotens riktning.Riktningen bestäms medbåde en magnetometer och med kameramodulen, för att undersöka vilken metod som passar bäst. Den förslagna konstruktionen för roboten och kamerasystemet kan spåra och fånga objekt med bådadera de testade modellerna för att beräkna bollbana, dock så är tillförlitligheten i den numeriska metoden betydligt känsligare för dåliga mätvärden. Det är även möjligt att använda sig av både magnetometern eller endast kameramodulen för att bestämma robotens riktning då båda ger ett fel under 1.5°.

APA, Harvard, Vancouver, ISO, and other styles

24

Khaligh-Razavi, Seyed-Mahdi. "Representational geometries of object vision in man and machine." Thesis, University of Cambridge, 2015. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.708729.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Turesson, Eric. "Multi-camera Computer Vision for Object Tracking: A comparative study." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21810.

Full text

Abstract:

Background: Video surveillance is a growing area where it can help with deterring crime, support investigation or to help gather statistics. These are just some areas where video surveillance can aid society. However, there is an improvement that could increase the efficiency of video surveillance by introducing tracking. More specifically, tracking between cameras in a network. Automating this process could reduce the need for humans to monitor and review since the tracking can track and inform the relevant people on its own. This has a wide array of usability areas, such as forensic investigation, crime alerting, or tracking down people who have disappeared. Objectives: What we want to investigate is the common setup of real-time multi-target multi-camera tracking (MTMCT) systems. Next up, we want to investigate how the components in an MTMCT system affect each other and the complete system. Lastly, we want to see how image enhancement can affect the MTMCT. Methods: To achieve our objectives, we have conducted a systematic literature review to gather information. Using the information, we implemented an MTMCT system where we evaluated the components to see how they interact in the complete system. Lastly, we implemented two image enhancement techniques to see how they affect the MTMCT. Results: As we have discovered, most often, MTMCT is constructed using a detection for discovering object, tracking to keep track of the objects in a single camera and a re-identification method to ensure that objects across cameras have the same ID. The different components have quite a considerable effect on each other where they can sabotage and improve each other. An example could be that the quality of the bounding boxes affect the data which re-identification can extract. We discovered that the image enhancement we used did not introduce any significant improvement. Conclusions: The most common structure for MTMCT are detection, tracking and re-identification. From our finding, we can see that all the component affect each other, but re-identification is the one that is mostly affected by the other components and the image enhancement. The two tested image enhancement techniques could not introduce enough improvement, but other image enhancement could be used to make the MTMCT perform better. The MTMCT system we constructed did not manage to reach real-time.

APA, Harvard, Vancouver, ISO, and other styles

26

Silva, João Miguel Ferreira da. "People and object tracking for video annotation." Master's thesis, Faculdade de Ciências e Tecnologia, 2012. http://hdl.handle.net/10362/8953.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Object tracking is a thoroughly researched problem, with a body of associated literature dating at least as far back as the late 1970s. However, and despite the development of some satisfactory real-time trackers, it has not yet seen widespread use. This is not due to a lack of applications for the technology, since several interesting ones exist. In this document, it is postulated that this status quo is due, at least in part, to a lack of easy to use software libraries supporting object tracking. An overview of the problems associated with object tracking is presented and the process of developing one such library is documented. This discussion includes how to overcome problems like heterogeneities in object representations and requirements for training or initial object position hints. Video annotation is the process of associating data with a video’s content. Associating data with a video has numerous applications, ranging from making large video archives or long videos searchable, to enabling discussion about and augmentation of the video’s content. Object tracking is presented as a valid approach to both automatic and manual video annotation, and the integration of the developed object tracking library into an existing video annotator, running on a tablet computer, is described. The challenges involved in designing an interface to support the association of video annotations with tracked objects in real-time are also discussed. In particular, we discuss our interaction approaches to handle moving object selection on live video, which we have called “Hold and Overlay” and “Hold and Speed Up”. In addition, the results of a set of preliminary tests are reported.
project “TKB – A Transmedia Knowledge Base for contemporary dance” (PTDC/EA /AVP/098220/2008 funded by FCT/MCTES), the UTAustin – Portugal, Digital Media Program (SFRH/BD/42662/2007 FCT/MCTES) and by CITI/DI/FCT/UNL (Pest-OE/EEI/UI0527/2011)

APA, Harvard, Vancouver, ISO, and other styles

27

Kim, Sunyoung. "The mathematics of object recognition in machine and human vision." CSUSB ScholarWorks, 2003. https://scholarworks.lib.csusb.edu/etd-project/2425.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Verdie, Yannick. "Surface Gesture & Object Tracking on Tabletop Devices." Thesis, Virginia Tech, 2010. http://hdl.handle.net/10919/32769.

Full text

Abstract:

In this thesis, we are interested in the use of tabletop surfaces for interactive manipulations. We focus on the implementation of Image Processing algorithms and techniques in two projects exploiting a horizontal surface: â Tangram Projectâ and â MirrorTrackâ . The â Tangram Projectâ studies childrenâ s mathematical skills when manipulating geometrical shapes. This project is supported by NFS (NSF 0736151) based on the proposal â Social Organization, Learning Technologies & Discourse: System Features for Facilitating Mathematical Reasoning in PreK-3 Studentsâ by M. Evans, F. Quek, R. Ehrich and J. Wilkins. Our contribution is the design and realization of visio-based tracking software that could be used in a classroom. Our implementation offers three modes of interaction making it easier to study the childrenâ s behaviors in specific situations and constraints. The â MirrorTrack Projectâ is an idea described in previous research [P.-K. Chung et al,2008a] [P.-K. Chung et al,2008b] using a horizontal surface with two side-mounted cameras to track fingertips. Our contribution to the â MirrorTrack Projectâ is the design and realization of video-based interaction software. â MirrorTrack Projectâ provides an improvement to one of the Tangram modes (the Virtual mode) by providing real 3D fingertip location above the surface. Among other functionalities, it provides hovering and touch detection [Y. Verdie et al, 2009]. We conclude by describing the possibility of merging those two systems and by highlighting the benefits of such a fusion. Integrating â MirrorTrackâ with the â Tangram projectâ provides even more interaction opportunities for the children.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

29

Huang, Kuang Man. "Tracking and analysis of C. elegans behavior using machine vision." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2008. http://wwwlib.umi.com/cr/ucsd/fullcit?p3297739.

Full text

Abstract:

Thesis (Ph. D.)--University of California, San Diego, 2008.
Title from first page of PDF file (viewed August 8, 2008). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 108-112).

APA, Harvard, Vancouver, ISO, and other styles

30

Kim, Kyungnam. "Algorithms and evaluation for object detection and tracking in computer vision." College Park, Md. : University of Maryland, 2005. http://hdl.handle.net/1903/2925.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2005.
Thesis research directed by: Computer Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

31

Wallenberg, Marcus. "Embodied Visual Object Recognition." Doctoral thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-132762.

Full text

Abstract:

Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. This is especially challenging due to the high dimensionality of image data. In cases where end-to-end learning from pixels to output is needed, mechanisms designed to make inputs tractable are often necessary for less computationally capable embodied systems.Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. Therefore, the way in which attention mechanisms should be introduced into feature extraction and estimation algorithms must be carefully considered when constructing a recognition system.This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, problem-specific feature selection, efficient estimator training and attentional modulation in convolutional neural networks. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. In order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. Feature selection and efficient discriminant sampling for decision tree-based estimators have also been implemented. Finally, attentional multi-layer modulation of convolutional neural networks for recognition in cluttered scenes has been evaluated. Several of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.
Embodied Visual Object Recognition
FaceTrack

APA, Harvard, Vancouver, ISO, and other styles

32

Brohan, Kevin Patrick. "Search and attention for machine vision." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/search-and-attention-for-machine-vision(a4747c9b-ac13-46d1-8895-5f2d88523d80).html.

Full text

Abstract:

This thesis addresses the generation of behaviourally useful, robust representations of the sensory world in the context of machine vision and behaviour. The goals of the work presented in this thesis are to investigate strategies for representing the visual world in a way which is behaviourally useful, to investigate the use of a neurally inspired early perceptual organisation system upon high-level processing in an object recognition system and to investigate the use of a perceptual organisation system on driving an object-based selection process. To address these problems, a biologically inspired framework for machine attention has been developed at a high level of neural abstraction, which has been heavily inspired by the psychological and physiological literature. The framework is described in this thesis, and three system implementations, which investigate the above issues, are described and analysed in detail. The primate brain has access to a coherent representation of the external world, which appears as objects at different spatial locations. It is through these representations that appropriate behavioural responses may be generated. For example, we do not become confused by cluttered scenes or by occluded objects. The representation of the visual scene is generated in a hierarchical computing structure in the primate brain: while shape and position information are able to drive attentional selection rapidly, high-level processes such as object recognition must be performed serially, passing through an attentional bottleneck. Through the process of attentional selection, the primate visual system identifies behaviourally relevant regions of the visual scene, which allows it to prioritise serial attentional shifts towards certain locations. In primates, the process of attentional selection is complex, operating upon surface representations which are robust to occlusion. Attention itself suppresses neural activity related to distractor objects, while sustaining activity relating to the target, allowing the target object to have a clear neural representation upon which the recognition process can operate. This thesis concludes that dynamic representations that are both early and robust against occlusion have the potential to be highly useful in machine vision and behaviour applications.

APA, Harvard, Vancouver, ISO, and other styles

33

Levy, Alfred K. "Object tracking in low frame-rate video sequences." Honors in the Major Thesis, University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/339.

Full text

Abstract:

This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Engineering
Computer Science

APA, Harvard, Vancouver, ISO, and other styles

34

Krieger, Evan. "Adaptive Fusion Approach for Multiple Feature Object Tracking." University of Dayton / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=dayton15435905735447.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Eslami, Seyed Mohammadali. "Generative probabilistic models for object segmentation." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/8898.

Full text

Abstract:

One of the long-standing open problems in machine vision has been the task of ‘object segmentation’, in which an image is partitioned into two sets of pixels: those that belong to the object of interest, and those that do not. A closely related task is that of ‘parts-based object segmentation’, where additionally each of the object’s pixels are labelled as belonging to one of several predetermined parts. There is broad agreement that segmentation is coupled to the task of object recognition. Knowledge of the object’s class can lead to more accurate segmentations, and in turn accurate segmentations can be used to obtain higher recognition rates. In this thesis we focus on one side of this relationship: given the object’s class and its bounding box, how accurately can we segment it? Segmentation is challenging primarily due to the huge amount of variability one sees in images of natural scenes. A large number of factors combine in complex ways to generate the pixel intensities that make up any given image. In this work we approach the problem by developing generative probabilistic models of the objects in question. Not only does this allow us to express notions of variability and uncertainty in a principled way, but also to separate the problems of model design and inference. The thesis makes the following contributions: First, we demonstrate an explicit probabilistic model of images of objects based on a latent Gaussian model of shape. This can be learned from images in an unsupervised fashion. Through experiments on a variety of datasets we demonstrate the advantages of explicitly modelling shape variability. We then focus on the task of constructing more accurate models of shape. We present a type of layered probabilistic model that we call a Shape Boltzmann Machine (SBM) for the task of modelling foreground/background (binary) and parts-based (categorical) shapes. We demonstrate that it constitutes the state-of-the-art and characterises a ‘strong’ model of shape, in that samples from the model look realistic and that it generalises to generate samples that differ from training examples. Finally, we demonstrate how the SBM can be used in conjunction with an appearance model to form a fully generative model of images of objects. We show how parts-based object segmentations can be obtained simply by performing probabilistic inference in this joint model. We apply the model to several challenging datasets and find that its performance is comparable to the state-of-the-art.

APA, Harvard, Vancouver, ISO, and other styles

36

Lan, Xiangyuan. "Multi-cue visual tracking: feature learning and fusion." HKBU Institutional Repository, 2016. https://repository.hkbu.edu.hk/etd_oa/319.

Full text

Abstract:

As an important and active research topic in computer vision community, visual tracking is a key component in many applications ranging from video surveillance and robotics to human computer. In this thesis, we propose new appearance models based on multiple visual cues and address several research issues in feature learning and fusion for visual tracking. Feature extraction and feature fusion are two key modules to construct the appearance model for the tracked target with multiple visual cues. Feature extraction aims to extract informative features for visual representation of the tracked target, and many kinds of hand-crafted feature descriptors which capture different types of visual information have been developed. However, since large appearance variations, e.g. occlusion, illumination may occur during tracking, the target samples may be contaminated/corrupted. As such, the extracted raw features may not be able to capture the intrinsic properties of the target appearance. Besides, without explicitly imposing the discriminability, the extracted features may potentially suffer background distraction problem. To extract uncontaminated discriminative features from multiple visual cues, this thesis proposes a novel robust joint discriminative feature learning framework which is capable of 1) simultaneously and optimally removing corrupted features and learning reliable classifiers, and 2) exploiting the consistent and feature-specific discriminative information of multiple feature. In this way, the features and classifiers learned from potentially corrupted tracking samples can be better utilized for target representation and foreground/background discrimination. As shown by the Data Processing Inequality, information fusion in feature level contains more information than that in classifier level. In addition, not all visual cues/features are reliable, and thereby combining all the features may not achieve a better tracking performance. As such, it is more reasonable to dynamically select and fuse multiple visual cues for visual tracking. Based on aforementioned considerations, this thesis proposes a novel joint sparse representation model in which feature selection, fusion, and representation are performed optimally in a unified framework. By taking advantages of sparse representation, unreliable features are detected and removed while reliable features are fused on feature level for target representation. In order to capture the non-linear similarity of features, the model is further extended to perform feature fusion in kernel space. Experimental results demonstrate the effectiveness of the proposed model. Since different visual cues extracted from the same object should share some commonalities in their representations and each feature should also have some diversities to reflect its complementarity in appearance modeling, another important problem in feature fusion is how to learn the commonality and diversity in the fused representations of multiple visual cues to enhance the tracking accuracy. Different from existing multi-cue sparse trackers which only consider the commonalities among the sparsity patterns of multiple visual cues, this thesis proposes a novel multiple sparse representation model for multi-cue visual tracking which jointly exploits the underlying commonalities and diversities of different visual cues by decomposing multiple sparsity patterns. Moreover, this thesis introduces a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple visual cues are more representative. Experimental results on tracking benchmark videos and other challenging videos show that the proposed tracker achieves better performance than the existing sparsity-based trackers and other state-of-the-art trackers.

APA, Harvard, Vancouver, ISO, and other styles

37

Yamato, Junji 1964. "Tracking moving object by stereo vision head with vergence for humanoid robot." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/9950.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Waddington, Gary. "Biedermans Recognition by Components (RBC) theory of human object recognition - an investigation." Thesis, University of Reading, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.301971.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Mhalla, Ala. "Multi-object detection and tracking in video sequences." Thesis, Université Clermont Auvergne‎ (2017-2020), 2018. http://www.theses.fr/2018CLFAC084/document.

Full text

Abstract:

Le travail développé dans cette thèse porte sur l'analyse de séquences vidéo. Cette dernière est basée sur 3 taches principales : la détection, la catégorisation et le suivi des objets. Le développement de solutions fiables pour l'analyse de séquences vidéo ouvre de nouveaux horizons pour plusieurs applications telles que les systèmes de transport intelligents, la vidéosurveillance et la robotique. Dans cette thèse, nous avons mis en avant plusieurs contributions pour traiter les problèmes de détection et de suivi d'objets multiples sur des séquences vidéo. Les techniques proposées sont basées sur l’apprentissage profonds et des approches de transfert d'apprentissage. Dans une première contribution, nous abordons le problème de la détection multi-objets en proposant une nouvelle technique de transfert d’apprentissage basé sur le formalisme et la théorie du filtre SMC (Sequential Monte Carlo) afin de spécialiser automatiquement un détecteur de réseau de neurones convolutionnel profond (DCNN) vers une scène cible. Dans une deuxième contribution, nous proposons une nouvelle approche de suivi multi-objets original basé sur des stratégies spatio-temporelles (entrelacement / entrelacement inverse) et un détecteur profond entrelacé, qui améliore les performances des algorithmes de suivi par détection et permet de suivre des objets dans des environnements complexes (occlusion, intersection, fort mouvement). Dans une troisième contribution, nous fournissons un système de surveillance du trafic, qui intègre une extension du technique SMC afin d’améliorer la précision de la détection de jour et de nuit et de spécialiser tout détecteur DCNN pour les caméras fixes et mobiles. Tout au long de ce rapport, nous fournissons des résultats quantitatifs et qualitatifs. Sur plusieurs aspects liés à l’analyse de séquences vidéo, ces travaux surpassent les cadres de détection et de suivi de pointe. En outre, nous avons implémenté avec succès nos infrastructures dans une plate-forme matérielle intégrée pour la surveillance et la sécurité du trafic routier
The work developed in this PhD thesis is focused on video sequence analysis. Thelatter consists of object detection, categorization and tracking. The development ofreliable solutions for the analysis of video sequences opens new horizons for severalapplications such as intelligent transport systems, video surveillance and robotics.In this thesis, we put forward several contributions to deal with the problems ofdetecting and tracking multi-objects on video sequences. The proposed frameworksare based on deep learning networks and transfer learning approaches.In a first contribution, we tackle the problem of multi-object detection by puttingforward a new transfer learning framework based on the formalism and the theoryof a Sequential Monte Carlo (SMC) filter to automatically specialize a Deep ConvolutionalNeural Network (DCNN) detector towards a target scene. The suggestedspecialization framework is used in order to transfer the knowledge from the sourceand the target domain to the target scene and to estimate the unknown target distributionas a specialized dataset composed of samples from the target domain. Thesesamples are selected according to the importance of their weights which reflectsthe likelihood that they belong to the target distribution. The obtained specializeddataset allows training a specialized DCNN detector to a target scene withouthuman intervention.In a second contribution, we propose an original multi-object tracking frameworkbased on spatio-temporal strategies (interlacing/inverse interlacing) and aninterlaced deep detector, which improves the performances of tracking-by-detectionalgorithms and helps to track objects in complex videos (occlusion, intersection,strong motion).In a third contribution, we provide an embedded system for traffic surveillance,which integrates an extension of the SMC framework so as to improve the detectionaccuracy in both day and night conditions and to specialize any DCNN detector forboth mobile and stationary cameras.Throughout this report, we provide both quantitative and qualitative results.On several aspects related to video sequence analysis, this work outperformsthe state-of-the-art detection and tracking frameworks. In addition, we havesuccessfully implemented our frameworks in an embedded hardware platform forroad traffic safety and monitoring

APA, Harvard, Vancouver, ISO, and other styles

40

Lin, Cong. "Non-rigid visual object tracking with statistical learning of appearance model." Thesis, University of Macau, 2017. http://umaclib3.umac.mo/record=b3691900.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Cuan, Bonan. "Deep similarity metric learning for multiple object tracking." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI065.

Full text

Abstract:

Le suivi d’objets multiples dans une scène est une tâche importante dans le domaine de la vision par ordinateur, et présente toujours de très nombreux verrous. Les objets doivent être détectés et distingués les uns des autres de manière continue et simultanée. Les approches «suivi par détection» sont largement utilisées, où la détection des objets est d’abord réalisée sur toutes les frames, puis le suivi est ramené à un problème d’association entre les détections d’un même objet et les trajectoires identifiées. La plupart des algorithmes de suivi associent des modèles de mouvement et des modèles d’apparence. Dans cette thèse, nous proposons un modèle de ré-identification basé sur l’apparence et utilisant l’apprentissage de métrique de similarité. Nous faisons tout d’abord appel à un réseau siamois profond pour apprendre un maping de bout en bout, des images d’entrée vers un espace de caractéristiques où les objets sont mieux discriminés. De nombreuses configurations sont évaluées, afin d’en déduire celle offrant les meilleurs scores. Le modèle ainsi obtenu atteint des résultats de ré-identification satisfaisants comparables à l’état de l’art. Ensuite, notre modèle est intégré dans un système de suivi d’objets multiples pour servir de guide d’apparence pour l’association des objets. Un modèle d’apparence est établi pour chaque objet détecté s’appuyant sur le modèle de ré-identification. Les similarités entre les objets détectés sont alors exploitées pour la classification. Par ailleurs, nous avons étudié la coopération et les interférences entre les modèles d’apparence et de mouvement dans le processus de suivi. Un couplage actif entre ces 2 modèles est proposé pour améliorer davantage les performances du suivi, et la contribution de chacun d’eux est estimée en continue. Les expérimentations menées dans le cadre du benchmark «Multiple Object Tracking Challenge» ont prouvé l’efficacité de nos propositions et donné de meilleurs résultats de suivi que l’état de l’art
Multiple object tracking, i.e. simultaneously tracking multiple objects in the scene, is an important but challenging visual task. Objects should be accurately detected and distinguished from each other to avoid erroneous trajectories. Since remarkable progress has been made in object detection field, “tracking-by-detection” approaches are widely adopted in multiple object tracking research. Objects are detected in advance and tracking reduces to an association problem: linking detections of the same object through frames into trajectories. Most tracking algorithms employ both motion and appearance models for data association. For multiple object tracking problems where exist many objects of the same category, a fine-grained discriminant appearance model is paramount and indispensable. Therefore, we propose an appearance-based re-identification model using deep similarity metric learning to deal with multiple object tracking in mono-camera videos. Two main contributions are reported in this dissertation: First, a deep Siamese network is employed to learn an end-to-end mapping from input images to a discriminant embedding space. Different metric learning configurations using various metrics, loss functions, deep network structures, etc., are investigated, in order to determine the best re-identification model for tracking. In addition, with an intuitive and simple classification design, the proposed model achieves satisfactory re-identification results, which are comparable to state-of-the-art approaches using triplet losses. Our approach is easy and fast to train and the learned embedding can be readily transferred onto the domain of tracking tasks. Second, we integrate our proposed re-identification model in multiple object tracking as appearance guidance for detection association. For each object to be tracked in a video, we establish an identity-related appearance model based on the learned embedding for re-identification. Similarities among detected object instances are exploited for identity classification. The collaboration and interference between appearance and motion models are also investigated. An online appearance-motion model coupling is proposed to further improve the tracking performance. Experiments on Multiple Object Tracking Challenge benchmark prove the effectiveness of our modifications, with a state-of-the-art tracking accuracy

APA, Harvard, Vancouver, ISO, and other styles

42

Ingersoll, Kyle. "Vision Based Multiple Target Tracking Using Recursive RANSAC." BYU ScholarsArchive, 2015. https://scholarsarchive.byu.edu/etd/4398.

Full text

Abstract:

In this thesis, the Recursive-Random Sample Consensus (R-RANSAC) multiple target tracking (MTT) algorithm is further developed and applied to video taken from static platforms. Development of R-RANSAC is primarily focused in three areas: data association, the ability to track maneuvering objects, and track management. The probabilistic data association (PDA) filter performs very well in the R-RANSAC framework and adds minimal computation cost over less sophisticated methods. The interacting multiple models (IMM) filter as well as higher-order linear models are incorporated into R-RANSAC to improve tracking of highly maneuverable targets. An effective track labeling system, a more intuitive track merging criteria, and other improvements were made to the track management system of R-RANSAC. R-RANSAC is shown to be a modular algorithm capable of incorporating the best features of competing MTT algorithms. A comprehensive comparison with the Gaussian mixture probability hypothesis density (GM-PHD) filter was conducted using pseudo-aerial videos of vehicles and pedestrians. R-RANSAC maintains superior track continuity, especially in cases of interacting and occluded targets, and has fewer missed detections when compared with the GM-PHD filter. The two algorithms perform similarly in terms of the number of false positives and tracking precision. The concept of a feedback loop between the tracker and sensor processing modules is extensively explored; the output tracks from R-RANSAC are used to inform how video processing is performed. We are able to indefinitely detect stationary objects by zeroing out the background update rate of target-associated pixels in a Gaussian mixture models (GMM) foreground detector. False positive foreground detections are eliminated with a minimum blob area threshold, a ghost suppression algorithm, and judicious tuning of the R-RANSAC parameters. The ability to detect stationary targets also allows R-RANSAC to be applied to a class of problems known as stationary object detection. Additionally, moving camera foreground detection techniques are applied to the static camera case in order to produce measurements with a velocity component; this is accomplished by using sequential-RANSAC to cluster optical flow vectors of FAST feature pairs. This further improves R-RANSAC's track continuity, especially with interacting targets. Finally, a hybrid algorithm composed of R-RANSAC and the Sequence Model (SM), a machine learner, is presented. The SM learns sequences of target locations and is able to assist in data association once properly trained. In simulation, we demonstrate the SM's ability to significantly improve tracking performance in situations with infrequent measurement updates and a high proportion of clutter measurements.

APA, Harvard, Vancouver, ISO, and other styles

43

Clark, Daniel S. "Object detection and tracking using a parts-based approach /." Link to online version, 2005. https://ritdml.rit.edu/dspace/handle/1850/1167.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

khan, saad. "MULTI-VIEW APPROACHES TO TRACKING, 3D RECONSTRUCTION AND OBJECT CLASS DETECTION." Doctoral diss., University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4066.

Full text

Abstract:

Multi-camera systems are becoming ubiquitous and have found application in a variety of domains including surveillance, immersive visualization, sports entertainment and movie special effects amongst others. From a computer vision perspective, the challenging task is how to most efficiently fuse information from multiple views in the absence of detailed calibration information and a minimum of human intervention. This thesis presents a new approach to fuse foreground likelihood information from multiple views onto a reference view without explicit processing in 3D space, thereby circumventing the need for complete calibration. Our approach uses a homographic occupancy constraint (HOC), which states that if a foreground pixel has a piercing point that is occupied by foreground object, then the pixel warps to foreground regions in every view under homographies induced by the reference plane, in effect using cameras as occupancy detectors. Using the HOC we are able to resolve occlusions and robustly determine ground plane localizations of the people in the scene. To find tracks we obtain ground localizations over a window of frames and stack them creating a space time volume. Regions belonging to the same person form contiguous spatio-temporal tracks that are clustered using a graph cuts segmentation approach. Second, we demonstrate that the HOC is equivalent to performing visual hull intersection in the image-plane, resulting in a cross-sectional slice of the object. The process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Slices from multiple planes are accumulated and the 3D structure of the object is segmented out. Unlike other visual hull based approaches that use 3D constructs like visual cones, voxels or polygonal meshes requiring calibrated views, ours is purely-image based and uses only 2D constructs i.e. planar homographies between views. This feature also renders it conducive to graphics hardware acceleration. The current GPU implementation of our approach is capable of fusing 60 views (480x720 pixels) at the rate of 50 slices/second. We then present an extension of this approach to reconstructing non-rigid articulated objects from monocular video sequences. The basic premise is that due to motion of the object, scene occupancies are blurred out with non-occupancies in a manner analogous to motion blurred imagery. Using our HOC and a novel construct: the temporal occupancy point (TOP), we are able to fuse multiple views of non-rigid objects obtained from a monocular video sequence. The result is a set of blurred scene occupancy images in the corresponding views, where the values at each pixel correspond to the fraction of total time duration that the pixel observed an occupied scene location. We then use a motion de-blurring approach to de-blur the occupancy images and obtain the 3D structure of the non-rigid object. In the final part of this thesis, we present an object class detection method employing 3D models of rigid objects constructed using the above 3D reconstruction approach. Instead of using a complicated mechanism for relating multiple 2D training views, our approach establishes spatial connections between these views by mapping them directly to the surface of a 3D model. To generalize the model for object class detection, features from supplemental views (obtained from Google Image search) are also considered. Given a 2D test image, correspondences between the 3D feature model and the testing view are identified by matching the detected features. Based on the 3D locations of the corresponding features, several hypotheses of viewing planes can be made. The one with the highest confidence is then used to detect the object using feature location matching. Performance of the proposed method has been evaluated by using the PASCAL VOC challenge dataset and promising results are demonstrated.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD

APA, Harvard, Vancouver, ISO, and other styles

45

Sigal, Leonid. "Continuous-state graphical models for object localization, pose estimation and tracking." View abstract/electronic edition; access limited to Brown University users, 2008. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3318361.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Doran, Matthew M. "The role of visual attention in multiple object tracking evidence from ERPS." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 110 p, 2009. http://proquest.umi.com/pqdweb?did=1885675151&sid=5&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Atkins, Philip J. "Spatiotemporal filtering with neural circuits for motion detection and tracking." Thesis, University of Brighton, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.318727.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Vinther, Sven. "Active 3D object recognition using geometric invariants." Thesis, University of Cambridge, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.362974.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Nelson, Eric D. "Zoom techniques for achieving scale invariant object tracking in real-time active vision systems /." Online version of the thesis, 2006. https://ritdml.rit.edu/dspace/handle/1850/2620.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Rodríguez, Florez Sergio Alberto. "Contributions by vision systems to multi-sensor object localization and tracking for intelligent vehicles." Compiègne, 2010. http://www.theses.fr/2010COMP1910.

Full text

Abstract:

Les systèmes d’aide à la conduite peuvent améliorer la sécurité routière en aidant les utilisateurs via des avertissements de situations dangereuses ou en déclenchant des actions appropriées en cas de collision imminente (airbags, freinage d’urgence, etc). Dans ce cas, la connaissance de la position et de la vitesse des objets mobiles alentours constitue une information clé. C’est pourquoi, dans ce travail, nous nous focalisons sur la détection et le suivi d’objets dans une scène dynamique. En remarquant que les systèmes multi-caméras sont de plus en plus présents dans les véhicules et en sachant que le lidar est performant pour la détection d’obstacles, nous nous intéressons à l’apport de la vision stéréoscopique dans la perception géométrique multimodale de l’environnement. Afin de fusionner les informations géométriques entre le lidar et le système de vision, nous avons développé un procédé de calibrage qui détermine les paramètres extrinsèques et évalue les incertitudes sur ces estimations. Nous proposons ensuite une méthode d’odométrie visuelle temps-réel permettant d’estimer le mouvement propre du véhicule afin de simplifier l’analyse du mouvement des objets dynamiques. Dans un second temps, nous montrons comment l’intégrité de la détection et du suivi des objets par lidar peut être améliorée en utilisant une méthode de confirmation visuelle qui procède par reconstruction dense de l’environnement 3D. Pour finir, le système de perception multimodal a été intégré sur une plateforme automobile, ce qui a permis de tester expérimentalement les différentes approches proposées dans des situations routières en environnement non contrôlé
Advanced Driver Assistance Systems (ADAS) can improve road safety by supporting the driver through warnings in hazardous circumstances or triggering appropriate actions when facing imminent collision situations (e. G. Airbags, emergency brake systems, etc). In this context, the knowledge of the location and the speed of the surrounding mobile objects constitute a key information. Consequently, in this work, we focus on object detection, localization and tracking in dynamic scenes. Noticing the increasing presence of embedded multi-camera systems on vehicles and recognizing the effectiveness of lidar automotive systems to detect obstacles, we investigate stereo vision systems contributions to multi-modal perception of the environment geometry. In order to fuse geometrical information between lidar and vision system, we propose a calibration process which determines the extrinsic parameters between the exteroceptive sensors and quantifies the uncertainties of this estimation. We present a real-time visual odometry method which estimates the vehicle ego-motion and simplifies dynamic object motion analysis. Then, the integrity of the lidar-based object detection and tracking is increased by the means of a visual confirmation method that exploits stereo-vision 3D dense reconstruction in focused areas. Finally, a complete full scale automotive system integrating the considered perception modalities was implemented and tested experimentally in open road situations with an experimental car

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Machine vision; Object tracking'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles