Дисертації з теми "Robot vision Mathematical models"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Robot vision Mathematical models.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 дисертацій для дослідження на тему "Robot vision Mathematical models".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Entschev, Peter Andreas. "Efficient construction of multi-scale image pyramids for real-time embedded robot vision." Universidade Tecnológica Federal do Paraná, 2013. http://repositorio.utfpr.edu.br/jspui/handle/1/720.

Повний текст джерела
Анотація:
Detectores de pontos de interesse, ou detectores de keypoints, têm sido de grande interesse para a área de visão robótica embarcada, especialmente aqueles que possuem robustez a variações geométricas, como rotação, transformações afins e mudanças em escala. A detecção de características invariáveis a escala é normalmente realizada com a construção de pirâmides de imagens em multiescala e pela busca exaustiva de extremos no espaço de escala, uma abordagem presente em métodos de reconhecimento de objetos como SIFT e SURF. Esses métodos são capazes de encontrar pontos de interesse bastante robustos, com propriedades adequadas para o reconhecimento de objetos, mas são ao mesmo tempo computacionalmente custosos. Nesse trabalho é apresentado um método eficiente para a construção de pirâmides de imagens em sistemas embarcados, como a plataforma BeagleBoard-xM, de forma similar ao método SIFT. O método aqui apresentado tem como objetivo utilizar técnicas computacionalmente menos custosas e a reutilização de informações previamente processadas de forma eficiente para reduzir a complexidade computacional. Para simplificar o processo de construção de pirâmides, o método utiliza filtros binomiais em substituição aos filtros Gaussianos convencionais utilizados no método SIFT original para calcular múltiplas escalas de uma imagem. Filtros binomiais possuem a vantagem de serem implementáveis utilizando notação ponto-fixo, o que é uma grande vantagem para muitos sistemas embarcados que não possuem suporte nativo a ponto-flutuante. A quantidade de convoluções necessária é reduzida pela reamostragem de escalas já processadas da pirâmide. Após a apresentação do método para construção eficiente de pirâmides, é apresentada uma maneira de implementação eficiente do método em uma plataforma SIMD (Single Instruction, Multiple Data, em português, Instrução Única, Dados Múltiplos) – a plataforma SIMD usada é a extensão ARM Neon disponível no processador ARM Cortex-A8 da BeagleBoard-xM. Plataformas SIMD em geral são muito úteis para aplicações multimídia, onde normalmente é necessário realizar a mesma operação em vários elementos, como pixels em uma imagem, permitindo que múltiplos dados sejam processados com uma única instrução do processador. Entretanto, a extensão Neon no processador Cortex-A8 não suporta operações em ponto-flutuante, tendo o método sido cuidadosamente implementado de forma a superar essa limitação. Por fim, alguns resultados sobre o método aqui proposto e método SIFT original são apresentados, incluindo seu desempenho em tempo de execução e repetibilidade de pontos de interesse detectados. Com uma implementação direta (sem o uso da plataforma SIMD), é mostrado que o método aqui apresentado necessita de aproximadamente 1/4 do tempo necessário para construir a pirâmide do método SIFT original, ao mesmo tempo em que repete até 86% dos pontos de interesse. Com uma abordagem completamente implementada em ponto-fixo (incluindo a vetorização com a plataforma SIMD) a repetibilidade chega a 92% dos pontos de interesse do método SIFT original, porém, reduzindo o tempo de processamento para menos de 3%.
Interest point detectors, or keypoint detectors, have been of great interest for embedded robot vision for a long time, especially those which provide robustness against geometrical variations, such as rotation, affine transformations and changes in scale. The detection of scale invariant features is normally done by constructing multi-scale image pyramids and performing an exhaustive search for extrema in the scale space, an approach that is present in object recognition methods such as SIFT and SURF. These methods are able to find very robust interest points with suitable properties for object recognition, but at the same time are computationally expensive. In this work we present an efficient method for the construction of SIFT-like image pyramids in embedded systems such as the BeagleBoard-xM. The method we present here aims at using computationally less expensive techniques and reusing already processed information in an efficient manner in order to reduce the overall computational complexity. To simplify the pyramid building process we use binomial filters instead of conventional Gaussian filters used in the original SIFT method to calculate multiple scales of an image. Binomial filters have the advantage of being able to be implemented by using fixed-point notation, which is a big advantage for many embedded systems that do not provide native floating-point support. We also reduce the amount of convolution operations needed by resampling already processed scales of the pyramid. After presenting our efficient pyramid construction method, we show how to implement it in an efficient manner in an SIMD (Single Instruction, Multiple Data) platform -- the SIMD platform we use is the ARM Neon extension available in the BeagleBoard-xM ARM Cortex-A8 processor. SIMD platforms in general are very useful for multimedia applications, where normally it is necessary to perform the same operation over several elements, such as pixels in images, enabling multiple data to be processed with a single instruction of the processor. However, the Neon extension in the Cortex-A8 processor does not support floating-point operations, so the whole method was carefully implemented to overcome this limitation. Finally, we provide some comparison results regarding the method we propose here and the original SIFT approach, including performance regarding execution time and repeatability of detected keypoints. With a straightforward implementation (without the use of the SIMD platform), we show that our method takes approximately 1/4 of the time taken to build the entire original SIFT pyramid, while repeating up to 86% of the interest points found with the original method. With a complete fixed-point approach (including vectorization within the SIMD platform) we show that repeatability reaches up to 92% of the original SIFT keypoints while reducing the processing time to less than 3%.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Nikolaidis, Stefanos. "Mathematical Models of Adaptation in Human-Robot Collaboration." Research Showcase @ CMU, 2017. http://repository.cmu.edu/dissertations/1121.

Повний текст джерела
Анотація:
While much work in human-robot interaction has focused on leaderfollower teamwork models, the recent advancement of robotic systems that have access to vast amounts of information suggests the need for robots that take into account the quality of the human decision making and actively guide people towards better ways of doing their task. This thesis proposes an equal-partners model, where human and robot engage in a dance of inference and action, and focuses on one particular instance of this dance: the robot adapts its own actions via estimating the probability of the human adapting to the robot. We start with a bounded-memory model of human adaptation parameterized by the human adaptability - the probability of the human switching towards a strategy newly demonstrated by the robot. We then examine more subtle forms of adaptation, where the human teammate adapts to the robot, without replicating the robot’s policy. We model the interaction as a repeated game, and present an optimal policy computation algorithm that has complexity linear to the number of robot actions. Integrating these models into robot action selection allows for human-robot mutual-adaptation. Human subject experiments in a variety of collaboration and shared-autonomy settings show that mutual adaptation significantly improves human-robot team performance, compared to one-way robot adaptation to the human.
Стилі APA, Harvard, Vancouver, ISO та ін.
3

朱國基 and Kwok-kei Chu. "Design and control of a six-legged mobile robot." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B31225895.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Landecker, Will. "Interpretable Machine Learning and Sparse Coding for Computer Vision." PDXScholar, 2014. https://pdxscholar.library.pdx.edu/open_access_etds/1937.

Повний текст джерела
Анотація:
Machine learning offers many powerful tools for prediction. One of these tools, the binary classifier, is often considered a black box. Although its predictions may be accurate, we might never know why the classifier made a particular prediction. In the first half of this dissertation, I review the state of the art of interpretable methods (methods for explaining why); after noting where the existing methods fall short, I propose a new method for a particular type of black box called additive networks. I offer a proof of trustworthiness for this new method (meaning a proof that my method does not "make up" the logic of the black box when generating an explanation), and verify that its explanations are sound empirically. Sparse coding is part of a family of methods that are believed, by many researchers, to not be black boxes. In the second half of this dissertation, I review sparse coding and its application to the binary classifier. Despite the fact that the goal of sparse coding is to reconstruct data (an entirely different goal than classification), many researchers note that it improves classification accuracy. I investigate this phenomenon, challenging a common assumption in the literature. I show empirically that sparse reconstruction is not necessarily the right intermediate goal, when our ultimate goal is classification. Along the way, I introduce a new sparse coding algorithm that outperforms competing, state-of-the-art algorithms for a variety of important tasks.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Choy, Siu Kai. "Statistical histogram characterization and modeling : theory and applications." HKBU Institutional Repository, 2008. http://repository.hkbu.edu.hk/etd_ra/913.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Ehtiati, Tina. "Strongly coupled Bayesian models for interacting object and scene classification processes." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=102975.

Повний текст джерела
Анотація:
In this thesis, we present a strongly coupled data fusion architecture within a Bayesian framework for modeling the bi-directional influences between the scene and object classification mechanisms. A number of psychophysical studies provide experimental evidence that the object and the scene perception mechanisms are not functionally separate in the human visual system. Object recognition facilitates the recognition of the scene background and also knowledge of the scene context facilitates the recognition of the individual objects in the scene. The evidence indicating a bi-directional exchange between the two processes has motivated us to build a computational model where object and scene classification proceed in an interdependent manner, while no hierarchical relationship is imposed between the two processes. We propose a strongly coupled data fusion model for implementing the feedback relationship between the scene and object classification processes. We present novel schemes for modifying the Bayesian solutions for the scene and object classification tasks which allow data fusion between the two modules based on the constraining of the priors or the likelihoods. We have implemented and tested the two proposed models using a database of natural images created for this purpose. The Receiver Operator Curves (ROC) depicting the scene classification performance of the likelihood coupling and the prior coupling models show that scene classification performance improves significantly in both models as a result of the strong coupling of the scene and object modules.
ROC curves depicting the scene classification performance of the two models also show that the likelihood coupling model achieves a higher detection rate compared to the prior coupling model. We have also computed the average rise times of the models' outputs as a measure of comparing the speed of the two models. The results show that the likelihood coupling model outputs have a shorter rise time. Based on these experimental findings one can conclude that imposing constraints on the likelihood models provides better solutions to the scene classification problems compared to imposing constraints on the prior models.
We have also proposed an attentional feature modulation scheme, which consists of tuning the input image responses to the bank of Gabor filters based on the scene class probabilities estimated by the model and the energy profiles of the Gabor filters for different scene categories. Experimental results based on combining the attentional feature tuning scheme with the likelihood coupling and the prior coupling methods show a significant improvement in the scene classification performances of both models.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Ngan, Yuk-tung Henry, and 顏旭東. "Motif-based method for patterned texture defect detection." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2008. http://hub.hku.hk/bib/B40203608.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Nifong, Nathaniel H. "Learning General Features From Images and Audio With Stacked Denoising Autoencoders." PDXScholar, 2014. https://pdxscholar.library.pdx.edu/open_access_etds/1550.

Повний текст джерела
Анотація:
One of the most impressive qualities of the brain is its neuro-plasticity. The neocortex has roughly the same structure throughout its whole surface, yet it is involved in a variety of different tasks from vision to motor control, and regions which once performed one task can learn to perform another. Machine learning algorithms which aim to be plausible models of the neocortex should also display this plasticity. One such candidate is the stacked denoising autoencoder (SDA). SDA's have shown promising results in the field of machine perception where they have been used to learn abstract features from unlabeled data. In this thesis I develop a flexible distributed implementation of an SDA and train it on images and audio spectrograms to experimentally determine properties comparable to neuro-plasticity. Specifically, I compare the visual-auditory generalization between a multi-level denoising autoencoder trained with greedy, layer-wise pre-training (GLWPT), to one trained without. I test a hypothesis that multi-modal networks will perform better than uni-modal networks due to the greater generality of features that may be learned. Furthermore, I also test the hypothesis that the magnitude of improvement gained from this multi-modal training is greater when GLWPT is applied than when it is not. My findings indicate that these hypotheses were not confirmed, but that GLWPT still helps multi-modal networks adapt to their second sensory modality.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

North, Ben. "Learning dynamical models for visual tracking." Thesis, University of Oxford, 1998. http://ora.ox.ac.uk/objects/uuid:6ed12552-4c30-4d80-88ef-7245be2d8fb8.

Повний текст джерела
Анотація:
Using some form of dynamical model in a visual tracking system is a well-known method for increasing robustness and indeed performance in general. Often, quite simple models are used and can be effective, but prior knowledge of the likely motion of the tracking target can often be exploited by using a specially-tailored model. Specifying such a model by hand, while possible, is a time-consuming and error-prone process. Much more desirable is for an automated system to learn a model from training data. A dynamical model learnt in this manner can also be a source of useful information in its own right, and a set of dynamical models can provide discriminatory power for use in classification problems. Methods exist to perform such learning, but are limited in that they assume the availability of 'ground truth' data. In a visual tracking system, this is rarely the case. A learning system must work from visual data alone, and this thesis develops methods for learning dynamical models while explicitly taking account of the nature of the training data --- they are noisy measurements. The algorithms are developed within two tracking frameworks. The Kalman filter is a simple and fast approach, applicable where the visual clutter is limited. The recently-developed Condensation algorithm is capable of tracking in more demanding situations, and can also employ a wider range of dynamical models than the Kalman filter, for instance multi-mode models. The success of the learning algorithms is demonstrated experimentally. When using a Kalman filter, the dynamical models learnt using the algorithms presented here produce better tracking when compared with those learnt using current methods. Learning directly from training data gathered using Condensation is an entirely new technique, and experiments show that many aspects of a multi-mode system can be successfully identified using very little prior information. Significant computational effort is required by the implementation of the methods, and there is scope for improvement in this regard. Other possibilities for future work include investigation of the strong links this work has with learning problems in other areas. Most notable is the study of the 'graphical models' commonly used in expert systems, where the ideas presented here promise to give insight and perhaps lead to new techniques.
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Bernier, Thomas. "Development of an algorithmic method for the recognition of biological objects." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ29656.pdf.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Chen, Pei. "An investigation of statistical aspects of linear subspace analysis for computer vision applications." Monash University, Dept. of Electrical and Computer Systems Engineering, 2004. http://arrow.monash.edu.au/hdl/1959.1/9705.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Voils, Danny. "Scale Invariant Object Recognition Using Cortical Computational Models and a Robotic Platform." PDXScholar, 2012. https://pdxscholar.library.pdx.edu/open_access_etds/632.

Повний текст джерела
Анотація:
This paper proposes an end-to-end, scale invariant, visual object recognition system, composed of computational components that mimic the cortex in the brain. The system uses a two stage process. The first stage is a filter that extracts scale invariant features from the visual field. The second stage uses inference based spacio-temporal analysis of these features to identify objects in the visual field. The proposed model combines Numenta's Hierarchical Temporal Memory (HTM), with HMAX developed by MIT's Brain and Cognitive Science Department. While these two biologically inspired paradigms are based on what is known about the visual cortex, HTM and HMAX tackle the overall object recognition problem from different directions. Image pyramid based methods like HMAX make explicit use of scale, but have no sense of time. HTM, on the other hand, only indirectly tackles scale, but makes explicit use of time. By combining HTM and HMAX, both scale and time are addressed. In this paper, I show that HTM and HMAX can be combined to make a com- plete cortex inspired object recognition model that explicitly uses both scale and time to recognize objects in temporal sequences of images. Additionally, through experimentation, I examine several variations of HMAX and its
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Endres, Dominik M. "Bayesian and information-theoretic tools for neuroscience." Thesis, St Andrews, 2006. http://hdl.handle.net/10023/162.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Hill, Evelyn June. "Applying statistical and syntactic pattern recognition techniques to the detection of fish in digital images." University of Western Australia. School of Mathematics and Statistics, 2004. http://theses.library.uwa.edu.au/adt-WU2004.0070.

Повний текст джерела
Анотація:
This study is an attempt to simulate aspects of human visual perception by automating the detection of specific types of objects in digital images. The success of the methods attempted here was measured by how well results of experiments corresponded to what a typical human’s assessment of the data might be. The subject of the study was images of live fish taken underwater by digital video or digital still cameras. It is desirable to be able to automate the processing of such data for efficient stock assessment for fisheries management. In this study some well known statistical pattern classification techniques were tested and new syntactical/ structural pattern recognition techniques were developed. For testing of statistical pattern classification, the pixels belonging to fish were separated from the background pixels and the EM algorithm for Gaussian mixture models was used to locate clusters of pixels. The means and the covariance matrices for the components of the model were used to indicate the location, size and shape of the clusters. Because the number of components in the mixture is unknown, the EM algorithm has to be run a number of times with different numbers of components and then the best model chosen using a model selection criterion. The AIC (Akaike Information Criterion) and the MDL (Minimum Description Length) were tested.The MDL was found to estimate the numbers of clusters of pixels more accurately than the AIC, which tended to overestimate cluster numbers. In order to reduce problems caused by initialisation of the EM algorithm (i.e. starting positions of mixtures and number of mixtures), the Dynamic Cluster Finding algorithm (DCF) was developed (based on the Dog-Rabbit strategy). This algorithm can produce an estimate of the locations and numbers of clusters of pixels. The Dog-Rabbit strategy is based on early studies of learning behaviour in neurons. The main difference between Dog-Rabbit and DCF is that DCF is based on a toroidal topology which removes the tendency of cluster locators to migrate to the centre of mass of the data set and miss clusters near the edges of the image. In the second approach to the problem, data was extracted from the image using an edge detector. The edges from a reference object were compared with the edges from a new image to determine if the object occurred in the new image. In order to compare edges, the edge pixels were first assembled into curves using an UpWrite procedure; then the curves were smoothed by fitting parametric cubic polynomials. Finally the curves were converted to arrays of numbers which represented the signed curvature of the curves at regular intervals. Sets of curves from different images can be compared by comparing the arrays of signed curvature values, as well as the relative orientations and locations of the curves. Discrepancy values were calculated to indicate how well curves and sets of curves matched the reference object. The total length of all matched curves was used to indicate what fraction of the reference object was found in the new image. The curve matching procedure gave results which corresponded well with what a human being being might observe.
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Mian, Ajmal Saeed. "Representations and matching techniques for 3D free-form object and face recognition." University of Western Australia. School of Computer Science and Software Engineering, 2007. http://theses.library.uwa.edu.au/adt-WU2007.0046.

Повний текст джерела
Анотація:
[Truncated abstract] The aim of visual recognition is to identify objects in a scene and estimate their pose. Object recognition from 2D images is sensitive to illumination, pose, clutter and occlusions. Object recognition from range data on the other hand does not suffer from these limitations. An important paradigm of recognition is model-based whereby 3D models of objects are constructed offline and saved in a database, using a suitable representation. During online recognition, a similar representation of a scene is matched with the database for recognizing objects present in the scene . . . The tensor representation is extended to automatic and pose invariant 3D face recognition. As the face is a non-rigid object, expressions can significantly change its 3D shape. Therefore, the last part of this thesis investigates representations and matching techniques for automatic 3D face recognition which are robust to facial expressions. A number of novelties are proposed in this area along with their extensive experimental validation using the largest available 3D face database. These novelties include a region-based matching algorithm for 3D face recognition, a 2D and 3D multimodal hybrid face recognition algorithm, fully automatic 3D nose ridge detection, fully automatic normalization of 3D and 2D faces, a low cost rejection classifier based on a novel Spherical Face Representation, and finally, automatic segmentation of the expression insensitive regions of a face.
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Riechel, Andrew T. "Force-Feasible Workspace Analysis and Motor Mount Disturbance Compensation for Point-Mass Cable Robots." Thesis, Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/5243.

Повний текст джерела
Анотація:
Cable-actuated manipulators (or 'cable robots') constitute a relatively new classification of robots which use motors, located at fixed remote locations, to manipulate an end-effector by extending or retracting cables. These manipulators possess a number of unique properties which make them proficient with tasks involving high payloads, large workspaces, and dangerous or contaminated environments. However, a number of challenges exist which have limited the mainstream emergence of cable robots. This thesis addresses two of the most important of these issues-- workspace analysis and disturbance compensation. Workspace issues are particularly important, as many large-scale applications require the end-effector to operate in regions of a particular shape, and to exert certain minimum forces throughout those regions. The 'Force-Feasible Workspace' represents the set of end-effector positions, for a given robot design, for which the robot can exert a set of required forces on its environment. This can be considered as the robot's 'usable' workspace, and an analysis of this workspace shape for point-mass cable robots is therefore presented to facilitate optimal cable robot design. Numerical simulation results are also presented to validate the analytical results, and to aid visualization of certain complex workspace shapes. Some cable robot applications may require mounting motors to moving bases (i.e. mobile robots) or other surfaces which are subject to disturbances (i.e. helicopters or crane arms). Such disturbances can propagate to the end-effector and cause undesired motion, so the rejection of motor mount disturbances is also of interest. This thesis presents a strategy for measuring these disturbances and compensating for them. General approaches and implementation issues are explored qualitatively with a simple one-degree-of-freedom prototype (including a strategy for mitigating accelerometer drift), and quantitative simulation results are presented as a proof of concept.
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Hasasneh, Ahmad. "Robot semantic place recognition based on deep belief networks and a direct use of tiny images." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00960289.

Повний текст джерела
Анотація:
Usually, human beings are able to quickly distinguish between different places, solely from their visual appearance. This is due to the fact that they can organize their space as composed of discrete units. These units, called ''semantic places'', are characterized by their spatial extend and their functional unity. Such a semantic category can thus be used as contextual information which fosters object detection and recognition. Recent works in semantic place recognition seek to endow the robot with similar capabilities. Contrary to classical localization and mapping works, this problem is usually addressed as a supervised learning problem. The question of semantic places recognition in robotics - the ability to recognize the semantic category of a place to which scene belongs to - is therefore a major requirement for the future of autonomous robotics. It is indeed required for an autonomous service robot to be able to recognize the environment in which it lives and to easily learn the organization of this environment in order to operate and interact successfully. To achieve that goal, different methods have been already proposed, some based on the identification of objects as a prerequisite to the recognition of the scenes, and some based on a direct description of the scene characteristics. If we make the hypothesis that objects are more easily recognized when the scene in which they appear is identified, the second approach seems more suitable. It is however strongly dependent on the nature of the image descriptors used, usually empirically derived from general considerations on image coding.Compared to these many proposals, another approach of image coding, based on a more theoretical point of view, has emerged the last few years. Energy-based models of feature extraction based on the principle of minimizing the energy of some function according to the quality of the reconstruction of the image has lead to the Restricted Boltzmann Machines (RBMs) able to code an image as the superposition of a limited number of features taken from a larger alphabet. It has also been shown that this process can be repeated in a deep architecture, leading to a sparse and efficient representation of the initial data in the feature space. A complex problem of classification in the input space is thus transformed into an easier one in the feature space. This approach has been successfully applied to the identification of tiny images from the 80 millions image database of the MIT. In the present work, we demonstrate that semantic place recognition can be achieved on the basis of tiny images instead of conventional Bag-of-Word (BoW) methods and on the use of Deep Belief Networks (DBNs) for image coding. We show that after appropriate coding a softmax regression in the projection space is sufficient to achieve promising classification results. To our knowledge, this approach has not yet been investigated for scene recognition in autonomous robotics. We compare our methods with the state-of-the-art algorithms using a standard database of robot localization. We study the influence of system parameters and compare different conditions on the same dataset. These experiments show that our proposed model, while being very simple, leads to state-of-the-art results on a semantic place recognition task.
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Hurdal, Monica Kimberly. "Mathematical and computer modelling of the human brain with reference to cortical magnification and dipole source localisation in the visual cortx." Thesis, Queensland University of Technology, 1998.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
19

Malmgren, Henrik. "Revision of an artificial neural network enabling industrial sorting." Thesis, Uppsala universitet, Institutionen för teknikvetenskaper, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-392690.

Повний текст джерела
Анотація:
Convolutional artificial neural networks can be applied for image-based object classification to inform automated actions, such as handling of objects on a production line. The present thesis describes theoretical background for creating a classifier and explores the effects of introducing a set of relatively recent techniques to an existing ensemble of classifiers in use for an industrial sorting system.The findings indicate that it's important to use spatial variety dropout regularization for high resolution image inputs, and use an optimizer configuration with good convergence properties. The findings also demonstrate examples of ensemble classifiers being effectively consolidated into unified models using the distillation technique. An analogue arrangement with optimization against multiple output targets, incorporating additional information, showed accuracy gains comparable to ensembling. For use of the classifier on test data with statistics different than those of the dataset, results indicate that augmentation of the input data during classifier creation helps performance, but would, in the current case, likely need to be guided by information about the distribution shift to have sufficiently positive impact to enable a practical application. I suggest, for future development, updated architectures, automated hyperparameter search and leveraging the bountiful unlabeled data potentially available from production lines.
Стилі APA, Harvard, Vancouver, ISO та ін.
20

"Calibration of an active vision system and feature tracking based on 8-point projective invariants." 1997. http://library.cuhk.edu.hk/record=b5889144.

Повний текст джерела
Анотація:
by Chen Zhi-Yi.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.
Includes bibliographical references.
List of Symbols S --- p.1
Chapter Chapter 1 --- Introduction
Chapter 1.1 --- Active Vision Paradigm and Calibration of Active Vision System --- p.1.1
Chapter 1.1.1 --- Active Vision Paradigm --- p.1.1
Chapter 1.1.2 --- A Review of the Existing Active Vision Systems --- p.1.1
Chapter 1.1.3 --- A Brief Introduction to Our Active Vision System --- p.1.2
Chapter 1.1.4 --- The Stages of Calibrating an Active Vision System --- p.1.3
Chapter 1.2 --- Projective Invariants and Their Applications to Feature Tracking --- p.1.4
Chapter 1.3 --- Thesis Overview --- p.1.4
References --- p.1.5
Chapter Chapter 2 --- Calibration for an Active Vision System: Camera Calibration
Chapter 2.1 --- An Overview of Camera Calibration --- p.2.1
Chapter 2.2 --- Tsai's RAC Based Camera Calibration Method --- p.2.5
Chapter 2.2.1 --- The Pinhole Camera Model with Radial Distortion --- p.2.7
Chapter 2.2.2 --- Calibrating a Camera Using Mono view Noncoplanar Points --- p.2.10
Chapter 2.3 --- Reg Willson's Implementation of R. Y. Tsai's RAC Based Camera Calibration Algorithm --- p.2.15
Chapter 2.4 --- Experimental Setup and Procedures --- p.2.20
Chapter 2.5 --- Experimental Results --- p.2.23
Chapter 2.6 --- Conclusion --- p.2.28
References --- p.2.29
Chapter Chapter 3 --- Calibration for an Active Vision System: Head-Eye Calibration
Chapter 3.1 --- Why Head-Eye Calibration --- p.3.1
Chapter 3.2 --- Review of the Existing Head-Eye Calibration Algorithms --- p.3.1
Chapter 3.2.1 --- Category I Classic Approaches --- p.3.1
Chapter 3.2.2 --- Category II Self-Calibration Techniques --- p.3.2
Chapter 3.3 --- R.Tsai's Approach for Hand-Eye (Head-Eye) Calibration --- p.3.3
Chapter 3.3.1 --- Introduction --- p.3.3
Chapter 3.3.2 --- Definitions of Coordinate Frames and Homogeoeous Transformation Matrices --- p.3.3
Chapter 3.3.3 --- Formulation of the Head-Eye Calibration Problem --- p.3.6
Chapter 3.3.4 --- Using Principal Vector to Represent Rotation Transformation Matrix --- p.3.7
Chapter 3.3.5 --- Calculating R cg and Tcg --- p.3.9
Chapter 3.4 --- Our Local Implementation of Tsai's Head Eye Calibration Algorithm --- p.3.14
Chapter 3.4.1 --- Using Denavit - Hartternberg's Approach to Establish a Body-Attached Coordinate Frame for Each Link of the Manipulator --- p.3.16
Chapter 3.5 --- Function of Procedures and Formats of Data Files --- p.3.23
Chapter 3.6 --- Experimental Results --- p.3.26
Chapter 3.7 --- Discussion --- p.3.45
Chapter 3.8 --- Conclusion --- p.3.46
References --- p.3.47
Appendix I Procedures --- p.3.48
Chapter Chapter 4 --- A New Tracking Method for Shape from Motion Using an Active Vision System
Chapter 4.1 --- Introduction --- p.4.1
Chapter 4.2 --- A New Tracking Method --- p.4.1
Chapter 4.2.1 --- Our approach --- p.4.1
Chapter 4.2.2 --- Using an Active Vision System to Track the Projective Basis Across Image Sequence --- p.4.2
Chapter 4.2.3 --- Using Projective Invariants to Track the Remaining Feature Points --- p.4.2
Chapter 4.3 --- Using Factorisation Method to Recover Shape from Motion --- p.4.11
Chapter 4.4 --- Discussion and Future Research --- p.4.31
References --- p.4.32
Chapter Chapter 5 --- Experiments on Feature Tracking with 3D Projective Invariants
Chapter 5.1 --- 8-point Projective Invariant --- p.5.1
Chapter 5.2 --- Projective Invariant Based Tranfer between Distinct Views of a 3-D Scene --- p.5.4
Chapter 5.3 --- Transfer Experiments on the Image Sequence of an Calibration Block --- p.5.6
Chapter 5.3.1 --- Experiment 1. Real Image Sequence 1 of a Camera Calibration Block --- p.5.6
Chapter 5.3.2 --- Experiment 2. Real Image Sequence 2 of a Camera Calibration Block --- p.5.15
Chapter 5.3.3 --- Experiment 3. Real Image Sequence 3 of a Camera Calibration Block --- p.5.22
Chapter 5.3.4 --- Experiment 4. Synthetic Image Sequence of a Camera Calibration Block --- p.5.27
Chapter 5.3.5 --- Discussions on the Experimental Results --- p.5.32
Chapter 5.4 --- Transfer Experiments on the Image Sequence of a Human Face Model --- p.5.33
References --- p.5.44
Chapter Chapter 6 --- Conclusions and Future Researches
Chapter 6.1 --- Contributions and Conclusions --- p.6.1
Chapter 6.2 --- Future Researches --- p.6.1
Bibliography --- p.B.1
Стилі APA, Harvard, Vancouver, ISO та ін.
21

"Segmentation based variational model for accurate optical flow estimation." 2009. http://library.cuhk.edu.hk/record=b5894018.

Повний текст джерела
Анотація:
Chen, Jianing.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2009.
Includes bibliographical references (leaves 47-54).
Abstract also in Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Background --- p.1
Chapter 1.2 --- Related Work --- p.3
Chapter 1.3 --- Thesis Organization --- p.5
Chapter 2 --- Review on Optical Flow Estimation --- p.6
Chapter 2.1 --- Variational Model --- p.6
Chapter 2.1.1 --- Basic Assumptions and Constraints --- p.6
Chapter 2.1.2 --- More General Energy Functional --- p.9
Chapter 2.2 --- Discontinuity Preserving Techniques --- p.9
Chapter 2.2.1 --- Data Term Robustification --- p.10
Chapter 2.2.2 --- Diffusion Based Regularization --- p.11
Chapter 2.2.3 --- Segmentation --- p.15
Chapter 2.3 --- Chapter Summary --- p.15
Chapter 3 --- Segmentation Based Optical Flow Estimation --- p.17
Chapter 3.1 --- Initial Flow --- p.17
Chapter 3.2 --- Color-Motion Segmentation --- p.19
Chapter 3.3 --- Parametric Flow Estimating Incorporating Segmentation --- p.21
Chapter 3.4 --- Confidence Map Construction --- p.24
Chapter 3.4.1 --- Occlusion detection --- p.24
Chapter 3.4.2 --- Pixel-wise motion coherence --- p.24
Chapter 3.4.3 --- Segment-wise model confidence --- p.26
Chapter 3.5 --- Final Combined Variational Model --- p.28
Chapter 3.6 --- Chapter Summary --- p.28
Chapter 4 --- Experiment Results --- p.30
Chapter 4.1 --- Quantitative Evaluation --- p.30
Chapter 4.2 --- Warping Results --- p.34
Chapter 4.3 --- Chapter Summary --- p.35
Chapter 5 --- Application - Single Image Animation --- p.37
Chapter 5.1 --- Introduction --- p.37
Chapter 5.2 --- Approach --- p.38
Chapter 5.2.1 --- Pre-Process Stage --- p.39
Chapter 5.2.2 --- Coordinate Transform --- p.39
Chapter 5.2.3 --- Motion Field Transfer --- p.41
Chapter 5.2.4 --- Motion Editing and Apply --- p.41
Chapter 5.2.5 --- Gradient-domain composition --- p.42
Chapter 5.3 --- Experiments --- p.43
Chapter 5.3.1 --- Active Motion Transfer --- p.43
Chapter 5.3.2 --- Animate Stationary Temporal Dynamics --- p.44
Chapter 5.4 --- Chapter Summary --- p.45
Chapter 6 --- Conclusion --- p.46
Bibliography --- p.47
Стилі APA, Harvard, Vancouver, ISO та ін.
22

"Unsupervised self-adaptive abnormal behavior detection for real-time surveillance." 2009. http://library.cuhk.edu.hk/record=b5894021.

Повний текст джерела
Анотація:
Yu, Tsz Ho.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2009.
Includes bibliographical references (leaves 95-100).
Abstract also in Chinese.
Chapter 1 --- Introduction --- p.2
Chapter 1.1 --- Surveillance and Computer Vision --- p.3
Chapter 1.2 --- The Need for Abnormal Behavior Detection --- p.3
Chapter 1.2.1 --- The Motivation --- p.3
Chapter 1.2.2 --- Choosing the Right Surveillance Target --- p.5
Chapter 1.3 --- Abnormal Behavior Detection: An Overview --- p.6
Chapter 1.3.1 --- Challenges in Detecting Abnormal Behaviors --- p.6
Chapter 1.3.2 --- Limitations of Existing Approaches --- p.8
Chapter 1.3.3 --- New Design Concepts --- p.9
Chapter 1.3.4 --- Requirements for Abnormal Behavior Detection --- p.10
Chapter 1.4 --- Contributions --- p.11
Chapter 1.4.1 --- An Unsupervised Experience-based Approach for Abnormal Behavior Detection --- p.11
Chapter 1.4.2 --- Motion Histogram Transform: A Novel Feature Descriptors --- p.12
Chapter 1.4.3 --- Real-time Algorithm for Abnormal Behavior Detection --- p.12
Chapter 1.5 --- Thesis Organization --- p.13
Chapter 2 --- Literature Review --- p.14
Chapter 2.1 --- From Segmentation to Visual Tracking --- p.14
Chapter 2.1.1 --- Environment Modeling and Segmentation --- p.15
Chapter 2.1.2 --- Spatial-temporal Feature Extraction --- p.18
Chapter 2.2 --- Detecting Irregularities in Videos --- p.21
Chapter 2.2.1 --- Model-based Method --- p.22
Chapter 2.2.2 --- Non Model-based Method --- p.26
Chapter 3 --- Design Framework --- p.29
Chapter 3.1 --- Dynamic Scene and Behavior Model --- p.30
Chapter 3.1.1 --- Images Sequences and Video --- p.30
Chapter 3.1.2 --- Motions and Behaviors in Video --- p.31
Chapter 3.1.3 --- Discovering Abnormal Behavior --- p.32
Chapter 3.1.4 --- Problem Definition --- p.33
Chapter 3.1.5 --- System Assumption --- p.34
Chapter 3.2 --- Methodology --- p.35
Chapter 3.2.1 --- Potential Improvements --- p.35
Chapter 3.2.2 --- The Design Framework --- p.36
Chapter 4 --- Implementation --- p.40
Chapter 4.1 --- Preprocessing --- p.40
Chapter 4.1.1 --- Data Input --- p.41
Chapter 4.1.2 --- Motion Detection --- p.41
Chapter 4.1.3 --- The Gaussian Mixture Background Model --- p.43
Chapter 4.2 --- Feature Extraction --- p.46
Chapter 4.2.1 --- Optical Flow Estimation --- p.47
Chapter 4.2.2 --- Motion Histogram Transforms --- p.53
Chapter 4.3 --- Feedback Learning --- p.56
Chapter 4.3.1 --- The Observation Matrix --- p.58
Chapter 4.3.2 --- Eigenspace Transformation --- p.58
Chapter 4.3.3 --- Self-adaptive Update Scheme --- p.61
Chapter 4.3.4 --- Summary --- p.62
Chapter 4.4 --- Classification --- p.63
Chapter 4.4.1 --- Detecting Abnormal Behavior via Statistical Saliencies --- p.64
Chapter 4.4.2 --- Determining Feedback --- p.65
Chapter 4.5 --- Localization and Output --- p.66
Chapter 4.6 --- Conclusion --- p.69
Chapter 5 --- Experiments --- p.71
Chapter 5.1 --- Experiment Setup --- p.72
Chapter 5.2 --- A Summary of Experiments --- p.74
Chapter 5.3 --- Experiment Results: Part 1 --- p.78
Chapter 5.4 --- Experiment Results: Part 2 --- p.81
Chapter 5.5 --- Experiment Results: Part 3 --- p.83
Chapter 5.6 --- Experiment Results: Part 4 --- p.86
Chapter 5.7 --- Analysis and Conclusion --- p.86
Chapter 6 --- Conclusions --- p.88
Chapter 6.1 --- Application Extensions --- p.88
Chapter 6.2 --- Limitations --- p.89
Chapter 6.2.1 --- Surveillance Range --- p.89
Chapter 6.2.2 --- Preparation Time for the System --- p.89
Chapter 6.2.3 --- Calibration of Background Model --- p.90
Chapter 6.2.4 --- Instability of Optical Flow Feature Extraction --- p.91
Chapter 6.2.5 --- Lack of 3D information --- p.91
Chapter 6.2.6 --- Dealing with Complex Behavior Patterns --- p.92
Chapter 6.2.7 --- Potential Improvements --- p.92
Chapter 6.2.8 --- New Method for Classification --- p.93
Chapter 6.2.9 --- Introduction of Dynamic Texture as a Feature --- p.93
Chapter 6.2.10 --- Using Multiple-camera System --- p.93
Chapter 6.3 --- Summary --- p.94
Bibliography --- p.95
Стилі APA, Harvard, Vancouver, ISO та ін.
23

Lee, Pei Yean. "Geometric optimization for computer vision." Phd thesis, 2005. http://hdl.handle.net/1885/149716.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
24

Bhavnagri, Burzin. "Computer vision using shape spaces / Burzin Bhavnagri." 1998. http://hdl.handle.net/2440/19160.

Повний текст джерела
Анотація:
Includes bibliography: p. 214-225 and index.
232 p. ; 30 cm.
Title page, contents and abstract only. The complete thesis in print form is available from the University Library.
This thesis investigates a computational model of vision based on assumptions pertaining to the physical structure of a camera and the scattering of light from visible surfaces. A sufficient condition to detect occlusions, intensity discontinuities, discontinuities in derivatives of intensity, surface discontinuities and discontinuities in derivatives of surfaces are given. This leads to an algorithm with linear time and space complexity to generate a collection of feature points with attributes in cyclically ordered groups. Two approaches to rejecting false hypotheses of correspondence were developed: an error minimising approach and an approach based on formal language. A non-iterative algorithm that can use the rotation between two cameras to produce an exact reconstruction of a scene is presented. Two methods of comparing global shapes with occlusions are pointed out: one based on a grammar, the other on Le's inequality on euclidean shapes.
Thesis (Ph.D.)--University of Adelaide, Dept. of Computer Science, 1998
Стилі APA, Harvard, Vancouver, ISO та ін.
25

Bhavnagri, Burzin. "Computer vision using shape spaces / Burzin Bhavnagri." Thesis, 1998. http://hdl.handle.net/2440/19160.

Повний текст джерела
Анотація:
Includes bibliography: p. 214-225 and index.
232 p. ; 30 cm.
This thesis investigates a computational model of vision based on assumptions pertaining to the physical structure of a camera and the scattering of light from visible surfaces. A sufficient condition to detect occlusions, intensity discontinuities, discontinuities in derivatives of intensity, surface discontinuities and discontinuities in derivatives of surfaces are given. This leads to an algorithm with linear time and space complexity to generate a collection of feature points with attributes in cyclically ordered groups. Two approaches to rejecting false hypotheses of correspondence were developed: an error minimising approach and an approach based on formal language. A non-iterative algorithm that can use the rotation between two cameras to produce an exact reconstruction of a scene is presented. Two methods of comparing global shapes with occlusions are pointed out: one based on a grammar, the other on Le's inequality on euclidean shapes.
Thesis (Ph.D.)--University of Adelaide, Dept. of Computer Science, 1998
Стилі APA, Harvard, Vancouver, ISO та ін.
26

"Example-based interpolation for correspondence-based computer vision problems." Thesis, 2006. http://library.cuhk.edu.hk/record=b6074147.

Повний текст джерела
Анотація:
EBI and iEBI mechanism have all the desirable properties of a good interpolation: all given input-output examples are satisfied exactly, and the interpolation is smooth with minimum oscillations between the examples.
Example-Based Interpolation (EBI) is a powerful method to interpolate function from a set of input-output examples. The first part of the dissertation exams the EBI in detail and proposes a new enhanced EBI, indexed function Example-Based Interpolation (iEBI). The second part demonstrates the application of both EBI and iEBI to solve three well-defined problems of computer vision.
First, the dissertation has analyzed EBI solution in detail. It argues and demonstrates that there are three desired properties for any EBI solution. To satisfy all three desirable properties, the EBI solution must have adequate degrees of freedom. This dissertation shows in details that, for the EBI solution to have enough degrees of freedom, it needs only be in a simple format: the sum of a basis function plus a linear function. This dissertation also presents that a particular EBI solution, in a certain least-squares-error sense, could satisfy exactly all the three desirable properties.
Moreover, this dissertation also points out EBI's restriction and describes a new interpolation mechanism that could overcome EBI's restriction by constructing general indexed function from examples. The new mechanism, referred to as the general indexed function Example-Based Interpolation (iEBI) mechanism, first applies EBI to establish the initial correspondences over all input examples, and then interpolates the general indexed function from those initial correspondences.
Novel View Synthesis (NVS) is an important problem in image rendering. It tries to synthesize an image of a scene at any specified (novel) viewpoint using only a few images of that scene at some sample viewpoints. To avoid explicit 3-D reconstruction of the scene, this dissertation formulates the problem of NVS as an indexed function interpolation problem by treating viewpoint and image as the input and output of a function. The interpolation formulation has at least two advantages. First, it allows certain imaging details like camera intrinsic parameters to be unknown. Second, the viewpoint specification need not be physical. For example, the specification could consist of any set of values that adequately describe the viewpoint space and need not be measured in metric units. This dissertation solves the NVS problem using the iEBI formulation and presents how the iEBI mechanism could be used to synthesize images at novel viewpoints and acquire quality novel views even from only a few example views.
Stereo matching, or the determination of corresponding image points projected by the same 3-D feature, is one of the fundamental and long-studied problems in computer vision. Yet, few have tried to solve it using interpolation. This dissertation presents an interpolation approach, Interpolation-based Iterative Stereo Matching (IISM), that could construct dense correspondences in stereo image from sparse initial correspondences. IISM improves the existing EBI to ensure that the established correspondences satisfy exactly the epipolar constraint of the image pair, and to a certain extent, preserve discontinuities in the stereo disparity space of the imaged scene. IISM utilizes the refinement technique of coarse-to-fine to iteratively apply the improved EBI algorithm, and eventually, produces the dense disparity map for stereo image pair.
The second part of the dissertation focuses on applying the EBI and iEBI methods to solve three correspondence-based problems in computer vision: (1) stereo matching, (2) novel view synthesis, and (3) viewpoint determination.
This dissertation also illustrates, for all the three problems, experimental results on a number of real and benchmarking image datasets, and shows that interpolation-based methods could be effective in arriving at good solution even with sparse input examples.
Viewpoint determination of image is the problem of, given an image, determining the viewpoint from which the image was taken. This dissertation demonstrates to solve this problem without referencing to or estimating any explicit 3-D structure of the imaged scene. Used for reference are a small number of sample snapshots of the scene, each of which has the associated viewpoint. By treating image and its associated viewpoint as the input and output of a function, and the given snapshot-viewpoint pairs as examples of that function, the problem has a natural formulation of interpolation. Same as that in NVS, the interpolation formulation allows the given images to be uncalibrated and the viewpoint specification to be not necessarily measured. This dissertation presents an interpolation-based solution using iEBI mechanism that guarantees all given sample data are satisfied exactly with the least complexity in the interpolated function.
Liang Bodong.
"February 2006."
Adviser: Ronald Chi-kit Chung.
Source: Dissertation Abstracts International, Volume: 67-11, Section: B, page: 6516.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2006.
Includes bibliographical references (p. 127-145).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts in English and Chinese.
School code: 1307.
Стилі APA, Harvard, Vancouver, ISO та ін.
27

"Learning mid-level representations for scene understanding." 2013. http://library.cuhk.edu.hk/record=b5549296.

Повний текст джерела
Анотація:
本論文包括了對場景分類框架的描述,并針對自然場景中學習中間層特徵表達的問題做了深入的探討。
當前的場景分類框架主要包括特徵提取,特稱編碼,空間信息整合和分類器學習幾個步驟。在這些步驟中,特徵提取是圖像理解的基礎環節。局部特徵表達被認為是計算機視覺在實際應用中成功的關鍵。但是近年來,中間層信息表達逐漸吸引了這個領域的眾多目光。本論文從兩個方面來理解中間層特徵。一個是局部底層信息的整合,另外一個是語義信息的嵌入。本文中,我們的工作同時覆蓋了“整合“和“語意“兩個方面。
在自然圖像的統計特徵中,我們發現圖像底層響應的相關性代表了局部結構信息。基於這個發現,我們構造了一個兩層學習模型。第一層是長得類似邊響應的底層信息,第二層是過完備的協方差特徵層,同時也是本文中提到的中間層信息。從“整合局部底層信息“的角度看,我們的方法在在這個方向上更進一步。我們將中間層特徵用到了場景分類中,并取得了良好的效果。特別是與人工設計的特徵相比,我們的特徵完全來自于自動學習。我們的協方差特徵的有效性為未來的特徵學習提供了一個新的思路:對於低層響應的相互關係的研究可以幫助構造表達能力更強的特徵。
爲了將語義信息加入到中間層特徵的學習中,我們定義了一個名詞叫做“信息化組分“。 所謂的信息化組分指的是那些能夠用來描述一類場景同時又能用來區分不同場景的結構化信息。基於固定秩的產生式模型的假設,我們設計了產生式模型和判別式分類器聯合學習的優化模型。通過將學習得到的信息化組分用到場景分類的實驗中,這類信息化結構的有效性得到了充分地證實。我們同時發現,如果將這一類信息化結構和底層的特徵表達聯合起來作為新的特徵表達,會使得分類的準確率得到進一步地提升。這個發現為我們未來的工作指引了方向:通過嘗試合併多層的特徵表達來提高整體的分類效果。
This thesis contains the review of state-of-the-art scene classification frameworks and study about learning mid-level representations for scene understanding.
Current scene classification pipeline consists of feature extraction, feature encoding, spatial aggregation, and classifier learning. Among these steps, feature extraction is the most fundamental one for scene understanding. Beyond low level features, obtaining effective mid-level representations catches eyes in the scene understanding field in recent years. We interpret mid-level representations from two perspectives. One is the aggregation from low level cues and the other is embedding semantic information. In this thesis, our work harvests both properties of “aggregation“ and “semantic“.
Given the observation from natural image statistics that correlations among patch-level responses contain strong structure information, we build a two-layer model. The first layer is the patch level response with edge-let appearance, and the second layer contains sparse covariance patterns, which is considered as the mid-level representation. From the view of “aggregation from low level cues“, our work moves one step further in this direction. We use learned covariance patterns in scene classification. It shows promising performance even compared with those human-designed features. The efficiency of our covariance patterns gives a new clue for feature learning, that is, correlations among lower-layer responses can help build more powerful feature representations.
With the motivation of coupling semantic information into building the mid-level representation, we define a new “informative components“ term in this thesis. Informative components refer to those regions that are descriptive within one class and also distinctive among different classes. Based on a generative assumption that descriptive regions can fit a fixed rank model, we provide an integrated optimization framework, which combines generative modeling and discriminative learning together. Experiments on scene classification bear out the efficiency of our informative components. We also find that by simply concatenating informative components with low level responses, the classification performance can be further improved. This throws light on the future direction to improve representation power via the combination of multiple-layer representations.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Wang, Liwei.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2013.
Includes bibliographical references (leaves 62-72).
Abstracts also in Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Scene Classification Pipeline --- p.1
Chapter 1.2 --- Learning Mid-Level Representations --- p.6
Chapter 1.3 --- Contributions and Organization --- p.7
Chapter 2 --- Background --- p.9
Chapter 2.1 --- Mid-level Representations --- p.9
Chapter 2.1.1 --- Aggregation FromLow Level Cues --- p.10
Chapter 2.1.2 --- Embedding Semantic Information --- p.13
Chapter 2.2 --- Scene Data Sets Description --- p.16
Chapter 3 --- Learning Sparse Covariance Patterns --- p.20
Chapter 3.1 --- Introduction --- p.20
Chapter 3.2 --- Model --- p.26
Chapter 3.3 --- Learning and Inference --- p.28
Chapter 3.3.1 --- Inference --- p.28
Chapter 3.3.2 --- Learning --- p.30
Chapter 3.4 --- Experiments --- p.31
Chapter 3.4.1 --- Structure Mapping --- p.33
Chapter 3.4.2 --- 15-Scene Classification --- p.34
Chapter 3.4.3 --- Indoor Scene Recognition --- p.36
Chapter 3.5 --- Summary --- p.38
Chapter 4 --- Learning Informative Components --- p.39
Chapter 4.1 --- Introduction --- p.39
Chapter 4.2 --- RelatedWork --- p.43
Chapter 4.3 --- OurModel --- p.45
Chapter 4.3.1 --- Component Level Representation --- p.45
Chapter 4.3.2 --- Fixed Rank Modeling --- p.46
Chapter 4.3.3 --- Informative Component Learning --- p.47
Chapter 4.4 --- Experiments --- p.52
Chapter 4.4.1 --- Informative Components Learning --- p.54
Chapter 4.4.2 --- Scene Classification --- p.55
Chapter 4.5 --- Summary --- p.58
Chapter 5 --- Conclusion --- p.60
Bibliography --- p.62
Стилі APA, Harvard, Vancouver, ISO та ін.
28

"Exploring intrinsic structures from samples: supervised, unsupervised, an semisupervised frameworks." 2007. http://library.cuhk.edu.hk/record=b5893342.

Повний текст джерела
Анотація:
Wang, Huan.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.
Includes bibliographical references (leaves 113-119).
Abstracts in English and Chinese.
Contents
Abstract --- p.i
Acknowledgement --- p.iii
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Learning Frameworks --- p.1
Chapter 1.2 --- Sample Representation --- p.3
Chapter 2 --- Background Study --- p.5
Chapter 2.1 --- Tensor Algebra --- p.5
Chapter 2.1.1 --- Tensor Unfolding (Flattening) --- p.6
Chapter 2.1.2 --- Tensor Product --- p.6
Chapter 2.2 --- Manifold Embedding and Dimensionality Reduction --- p.8
Chapter 2.2.1 --- Principal Component Analysis (PCA) --- p.9
Chapter 2.2.2 --- Metric Multidimensional Scaling (MDS) --- p.10
Chapter 2.2.3 --- Isomap --- p.10
Chapter 2.2.4 --- Locally Linear Embedding (LLE) --- p.11
Chapter 2.2.5 --- Discriminant Analysis --- p.11
Chapter 2.2.6 --- Laplacian Eigenmap --- p.14
Chapter 2.2.7 --- Graph Embedding: A General Framework --- p.15
Chapter 2.2.8 --- Maximum Variance Unfolding --- p.16
Chapter 3 --- The Trace Ratio Optimization --- p.17
Chapter 3.1 --- Introduction --- p.17
Chapter 3.2 --- Dimensionality Reduction Formulations: Trace Ratio vs. Ratio Trace --- p.19
Chapter 3.3 --- Efficient Solution of Trace Ratio Problem --- p.22
Chapter 3.4 --- Proof of Convergency to Global Optimum --- p.23
Chapter 3.4.1 --- Proof of the monotonic increase of λn --- p.23
Chapter 3.4.2 --- Proof of Vn convergence and global optimum for λ --- p.24
Chapter 3.5 --- Extension and Discussion --- p.27
Chapter 3.5.1 --- Extension to General Constraints --- p.27
Chapter 3.5.2 --- Discussion --- p.28
Chapter 3.6 --- Experiments --- p.29
Chapter 3.6.1 --- Dataset Preparation --- p.30
Chapter 3.6.2 --- Convergence Speed --- p.31
Chapter 3.6.3 --- Visualization of Projection Matrix --- p.31
Chapter 3.6.4 --- Classification by Linear Trace Ratio Algorithms with Orthogonal Constraints --- p.33
Chapter 3.6.5 --- Classification by Kernel Trace Ratio algorithms with General Constraints --- p.36
Chapter 3.7 --- Conclusion --- p.36
Chapter 4 --- A Convergent Solution to Tensor Subspace Learning --- p.40
Chapter 4.1 --- Introduction --- p.40
Chapter 4.2 --- Subspace Learning with Tensor Data --- p.43
Chapter 4.2.1 --- Graph Embedding with Tensor Representation --- p.43
Chapter 4.2.2 --- Computational Issues --- p.46
Chapter 4.3 --- Solution Procedure and Convergency Proof --- p.46
Chapter 4.3.1 --- Analysis of Monotonous Increase Property --- p.47
Chapter 4.3.2 --- Proof of Convergency --- p.48
Chapter 4.4 --- Experiments --- p.50
Chapter 4.4.1 --- Data Sets --- p.50
Chapter 4.4.2 --- Monotonicity of Objective Function Value --- p.51
Chapter 4.4.3 --- Convergency of the Projection Matrices . . --- p.52
Chapter 4.4.4 --- Face Recognition --- p.52
Chapter 4.5 --- Conclusions --- p.54
Chapter 5 --- Maximum Unfolded Embedding --- p.57
Chapter 5.1 --- Introduction --- p.57
Chapter 5.2 --- Maximum Unfolded Embedding --- p.59
Chapter 5.3 --- Optimize Trace Ratio --- p.60
Chapter 5.4 --- Another Justification: Maximum Variance Em- bedding --- p.60
Chapter 5.5 --- Linear Extension: Maximum Unfolded Projection --- p.61
Chapter 5.6 --- Experiments --- p.62
Chapter 5.6.1 --- Data set --- p.62
Chapter 5.6.2 --- Evaluation Metric --- p.63
Chapter 5.6.3 --- Performance Comparison --- p.64
Chapter 5.6.4 --- Generalization Capability --- p.65
Chapter 5.7 --- Conclusion --- p.67
Chapter 6 --- Regression on MultiClass Data --- p.68
Chapter 6.1 --- Introduction --- p.68
Chapter 6.2 --- Background --- p.70
Chapter 6.2.1 --- Intuitive Motivations --- p.70
Chapter 6.2.2 --- Related Work --- p.72
Chapter 6.3 --- Problem Formulation --- p.73
Chapter 6.3.1 --- Notations --- p.73
Chapter 6.3.2 --- Regularization along Data Manifold --- p.74
Chapter 6.3.3 --- Cross Manifold Label Propagation --- p.75
Chapter 6.3.4 --- Inter-Manifold Regularization --- p.78
Chapter 6.4 --- Regression on Reproducing Kernel Hilbert Space (RKHS) --- p.79
Chapter 6.5 --- Experiments --- p.82
Chapter 6.5.1 --- Synthetic Data: Nonlinear Two Moons . . --- p.82
Chapter 6.5.2 --- Synthetic Data: Three-class Cyclones --- p.83
Chapter 6.5.3 --- Human Age Estimation --- p.84
Chapter 6.6 --- Conclusions --- p.86
Chapter 7 --- Correspondence Propagation --- p.88
Chapter 7.1 --- Introduction --- p.88
Chapter 7.2 --- Problem Formulation and Solution --- p.92
Chapter 7.2.1 --- Graph Construction --- p.92
Chapter 7.2.2 --- Regularization on categorical Product Graph --- p.93
Chapter 7.2.3 --- Consistency in Feature Domain and Soft Constraints --- p.96
Chapter 7.2.4 --- Inhomogeneous Pair Labeling . --- p.97
Chapter 7.2.5 --- Reliable Correspondence Propagation --- p.98
Chapter 7.2.6 --- Rearrangement and Discretizing --- p.100
Chapter 7.3 --- Algorithmic Analysis --- p.100
Chapter 7.3.1 --- Selection of Reliable Correspondences . . . --- p.100
Chapter 7.3.2 --- Computational Complexity --- p.102
Chapter 7.4 --- Applications and Experiments --- p.102
Chapter 7.4.1 --- Matching Demonstration on Object Recognition Databases --- p.103
Chapter 7.4.2 --- Automatic Feature Matching on Oxford Image Transformation Database . --- p.104
Chapter 7.4.3 --- Influence of Reliable Correspondence Number --- p.106
Chapter 7.5 --- Conclusion and Future Works --- p.106
Chapter 8 --- Conclusion and Future Work --- p.110
Bibliography --- p.113
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Li, Mingxu. "Numerical analysis of robot dynamics algorithms." Master's thesis, 2012. http://hdl.handle.net/1885/155788.

Повний текст джерела
Анотація:
This thesis presents two issues related to robot dynamics algorithms. We first discuss the planar robot dynamics algorithms because it is useful to study robot motion in the plane before generalizing to 3D. The planar versions of the three most commonly used dynamics algorithms, the recursive Newton-Euler algorithm (RNEA), the articulated-body algorithm (ABA)) and the composite rigid-body algorithm (CRBA) are obtained by using planar vectors, tensors and coordinate transforms. It is shown that the planar algorithms are asymptotically between 4 and 4.8 times faster than their comparable spatial counterparts. Moreover, the numerical accuracy of robot dynamics algorithms need to be equally considered. Investigations into the numerical accuracy of the RNEA, the ABA, the CRBA, the constraint force algorithm (CFA), the divide-and-conquer algorithm (DCA) and pivoted divide-and-conquer algorithm (DCAp) are explored. It is shown by the empirical study that the three parallel algorithms, the CFA, the DCA, and the DCAp, are significantly less accurate than the two serial algorithms, the ABA and CRBA. However, the performances of the planar versions of dynamics algorithms are different, and the accuracy of the parallel algorithms is comparable with the serial ones. In addition, we use the CESTAC (Controle et Estimation Stochastique des Arrondic de Calculs) and the affine arithmetic (AA) to estimate the propagation of round-off errors in robot dynamics algorithms. The accomplishments provided in this thesis represent better understanding of the performances of the existing robot dynamics algorithms.
Стилі APA, Harvard, Vancouver, ISO та ін.
30

"Dynamic modeling and simulation of a multi-fingered robot hand." 1998. http://library.cuhk.edu.hk/record=b5889616.

Повний текст джерела
Анотація:
by Joseph Chun-kong Chan.
Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.
Includes bibliographical references (leaves 117-124).
Abstract also in Chinese.
Abstract --- p.i
Acknowledgments --- p.iv
List of Figures --- p.xi
List of Tables --- p.xii
List of Algorithms --- p.xiii
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Motivation --- p.1
Chapter 1.2 --- Related Work --- p.5
Chapter 1.3 --- Contributions --- p.7
Chapter 1.4 --- Organization of the Thesis --- p.9
Chapter 2 --- Contact Modeling: Kinematics --- p.11
Chapter 2.1 --- Introduction --- p.11
Chapter 2.2 --- Contact Kinematics between Two Rigid Bodies --- p.14
Chapter 2.2.1 --- Contact Modes --- p.14
Chapter 2.2.2 --- Montana's Contact Equations --- p.15
Chapter 2.3 --- Finger Kinematics --- p.18
Chapter 2.3.1 --- Finger Forward Kinematics --- p.19
Chapter 2.3.2 --- Finger Jacobian --- p.21
Chapter 2.4 --- Grasp Kinematics between a Finger and an Object --- p.21
Chapter 2.4.1 --- Velocity Transformation between Different Coordinate Frames --- p.22
Chapter 2.4.2 --- Grasp Kinematics for the zth Contact --- p.23
Chapter 2.4.3 --- Different Fingertip Models and Different Contact Modes --- p.25
Chapter 2.5 --- Velocity Constraints of the Entire System --- p.28
Chapter 2.6 --- Summary --- p.29
Chapter 3 --- Contact Modeling: Dynamics --- p.31
Chapter 3.1 --- Introduction --- p.31
Chapter 3.2 --- Multi-fingered Robot Hand Dynamics --- p.33
Chapter 3.3 --- Object Dynamics --- p.35
Chapter 3.4 --- Constrained System Dynamics --- p.37
Chapter 3.5 --- Summary --- p.39
Chapter 4 --- Collision Modeling --- p.40
Chapter 4.1 --- Introduction --- p.40
Chapter 4.2 --- Assumptions of Collision --- p.42
Chapter 4.3 --- Collision Point Velocities --- p.43
Chapter 4.3.1 --- Collision Point Velocity of the ith. Finger --- p.43
Chapter 4.3.2 --- Collision Point Velocity of the Object --- p.46
Chapter 4.3.3 --- Relative Collision Point Velocity --- p.47
Chapter 4.4 --- Equations of Collision --- p.47
Chapter 4.4.1 --- Sliding Mode Collision --- p.48
Chapter 4.4.2 --- Sticking Mode Collision --- p.49
Chapter 4.5 --- Summary --- p.51
Chapter 5 --- Dynamic Simulation --- p.53
Chapter 5.1 --- Introduction --- p.53
Chapter 5.2 --- Architecture of the Dynamic Simulation System --- p.54
Chapter 5.2.1 --- Input Devices --- p.54
Chapter 5.2.2 --- Dynamic Simulator --- p.58
Chapter 5.2.3 --- Virtual Environment --- p.60
Chapter 5.3 --- Methodologies and Program Flow of the Dynamic Simulator --- p.60
Chapter 5.3.1 --- Interference Detection --- p.61
Chapter 5.3.2 --- Constraint-based Simulation --- p.63
Chapter 5.3.3 --- Impulse-based Simulation --- p.66
Chapter 5.4 --- Summary --- p.69
Chapter 6 --- Simulation Results --- p.71
Chapter 6.1 --- Introduction --- p.71
Chapter 6.2 --- Change of Grasping Configurations --- p.71
Chapter 6.3 --- Rolling Contact --- p.76
Chapter 6.4 --- Sliding Contact --- p.76
Chapter 6.5 --- Collisions --- p.85
Chapter 6.6 --- Dextrous Manipulation Motions --- p.93
Chapter 6.7 --- Summary --- p.94
Chapter 7 --- Conclusions --- p.99
Chapter 7.1 --- Summary of Contributions --- p.99
Chapter 7.2 --- Future Work --- p.100
Chapter 7.2.1 --- Improvement of Current System --- p.100
Chapter 7.2.2 --- Applications --- p.101
Chapter A --- Montana's Contact Equations for Finger-object Contact --- p.103
Chapter A.1 --- Local Coordinates Charts --- p.103
Chapter A.2 --- "Curvature, Torsion and Metric Tensors" --- p.104
Chapter A.3 --- Montana's Contact Equations --- p.106
Chapter B --- Finger Dynamics --- p.108
Chapter B.1 --- Forward Kinematics of a Robot Finger --- p.108
Chapter B.1.1 --- Link-coordinate Transformation --- p.109
Chapter B.1.2 --- Forward Kinematics --- p.109
Chapter B.2 --- Dynamic Equation of a Robot Finger --- p.110
Chapter B.2.1 --- Kinetic and Potential Energy --- p.110
Chapter B.2.2 --- Lagrange's Equation --- p.111
Chapter C --- Simulation Configurations --- p.113
Chapter C.1 --- Geometric models --- p.113
Chapter C.2 --- Physical Parameters --- p.113
Chapter C.3 --- Simulation Parameters --- p.116
Bibliography --- p.124
Стилі APA, Harvard, Vancouver, ISO та ін.
31

Lim, John Jin Keat. "Egomotion estimation with large field-of-view vision." Phd thesis, 2010. http://hdl.handle.net/1885/150721.

Повний текст джерела
Анотація:
This thesis investigates the problem of egomotion estimation in a monocular, large Field-of-View (FOV) camera from two views of the scene. Our focus is on developing new constraints and algorithms that exploit the larger information content of wide FOV images and the geometry of image spheres in order to aid and simplify the egomotion recovery task. We will consider both the scenario of small or differential camera motions, as well as the more general case of discrete camera motions. Beginning with the equations relating differential camera egomotion and optical flow, we show that the directions of flow measured at antipodal points on the image sphere will constrain the directions of egomotion to some subset region on the sphere. By considering the flow at many such antipodal point pairs, it is shown that the intersection of all subset regions arising from each pair yields an estimate of the directions of motion. These constraints were used in an algorithm that performs Hough-reminiscent voting in 2-dimensions to robustly recover motion. Furthermore, we showed that by summing the optical flow vectors at antipodal points, the camera translation may be constrained to lie on a plane. Two or more pairs of antipodal points will then give multiple such planes, and their intersection gives some estimate of the translation direction (rotation may be recovered via a second step). We demonstrate the use of our constraints with two robust and practical algorithms, one based on the RANSAC sampling strategy, and one based on Hough-like voting. The main drawback of the previous two approaches was that they were limited to scenarios where camera motions were small. For estimating larger, discrete camera motions, a different formulation of the problem is required. To this end, we introduce the antipodal-epipolar constraints on relative camera motion. By using antipodal points, the translational and rotational motions of a camera are geometrically decoupled, allowing them to be separately estimated as two problems in smaller dimensions. Two robust algorithms, based on RANSAC and Hough voting, are proposed to demonstrate these constraints. Experiments demonstrated that our constraints and algorithms work competitively with the state-of-the-art in noisy simulations and on real image sequences, with the advantage of improved robustness to outlier noise in the data. Furthermore, by breaking up the problem and solving them separately, more efficient algorithms were possible, leading to reduced sampling time for the RANSAC based schemes, and the development of efficient Hough voting algorithms which perform in constant time under increasing outlier probabilities. In addition to these contributions, we also investigated the problem of 'relaxed egomotion', where the accuracy of estimates is traded off for speed and less demanding computational requirements. We show that estimates which are inaccurate but still robust to outliers are of practical use as long as measurable bounds on the maximum error are maintained. In the context of the visual homing problem, we demonstrate algorithms that give coarse estimates of translation, but which still result in provably successful homing. Experiments involving simulations and homing in real robots demonstrated the robust performance of these methods in noisy, outlier-prone conditions.
Стилі APA, Harvard, Vancouver, ISO та ін.
32

"Three dimensional motion tracking using micro inertial measurement unit and monocular visual system." 2011. http://library.cuhk.edu.hk/record=b5894605.

Повний текст джерела
Анотація:
Lam, Kin Kwok.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.
Includes bibliographical references (leaves 99-103).
Abstracts in English and Chinese.
Abstract --- p.ii
摘要 --- p.iii
Acknowledgements --- p.iv
Table of Contents --- p.v
List of Figures --- p.viii
List of Tables --- p.xi
Chapter Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Intrinsic Problem of Today's Pose Estimation Systems --- p.1
Chapter 1.2 --- Multi-sensors Data Fusion --- p.2
Chapter 1.3 --- Objectives and Contributions --- p.3
Chapter 1.4 --- Organization of the dissertation --- p.4
Chapter Chapter 2 --- Architecture of Sensing System --- p.5
Chapter 2.1 --- Hardware for Pose Estimation System --- p.5
Chapter 2.2 --- Software for Pose Estimation System --- p.6
Chapter Chapter 3 --- Inertial Measurement System --- p.7
Chapter 3.1 --- Basic knowledge of Inertial Measurement System --- p.7
Chapter 3.2 --- Strapdown Inertial Navigation --- p.8
Chapter 3.2.1 --- Tracking Orientation --- p.9
Chapter 3.2.2 --- Discussion of Attitude Representations --- p.14
Chapter 3.2.3 --- Tracking Position --- p.16
Chapter 3.3 --- Summary of Strapdown Inertial Navigation --- p.16
Chapter Chapter 4 --- Visual Tracking System --- p.17
Chapter 4.1 --- Background of Visual Tracking System --- p.17
Chapter 4.2 --- Basic knowledge of Camera Calibration and Model --- p.18
Chapter 4.2.1 --- Related Coordinate Frames --- p.18
Chapter 4.2.2 --- Pinhole Camera Model --- p.20
Chapter 4.2.3 --- Calibration for Nonlinear Model --- p.21
Chapter 4.3 --- Implementation of Process to Calibrate Camera --- p.22
Chapter 4.3.1 --- Image Capture and Corners Extraction --- p.22
Chapter 4.3.2 --- Camera Calibration --- p.23
Chapter 4.4 --- Perspective-n-Point Problem --- p.25
Chapter 4.5 --- Camera Pose Estimation Algorithms --- p.26
Chapter 4.5.1 --- Pose Estimation Using Quadrangular Targets --- p.27
Chapter 4.5.2 --- Efficient Perspective-n-Point Camera Pose Estimation --- p.31
Chapter 4.5.3 --- Linear N-Point Camera Pose Determination --- p.33
Chapter 4.5.4 --- Pose Estimation from Orthography and Scaling with Iterations --- p.36
Chapter 4.6 --- Experimental Results of Camera Pose Estimation Algorithms --- p.40
Chapter 4.6.1 --- Simulation Test --- p.40
Chapter 4.6.2 --- Real Images Test --- p.43
Chapter 4.6.3 --- Summary --- p.46
Chapter Chapter 5 --- Kalman Filter --- p.47
Chapter 5.1 --- Linear Dynamic System Model --- p.48
Chapter 5.2 --- Time Update --- p.48
Chapter 5.3 --- Measurement Update --- p.49
Chapter 5.3.1 --- Maximum a Posterior Probability --- p.49
Chapter 5.3.2 --- Batch Least-Square Estimation --- p.51
Chapter 5.3.3 --- Measurement Update in Kalman Filter --- p.54
Chapter 5.4 --- Summary of Kalman Filter --- p.56
Chapter Chapter 6 --- Extended Kalman Filter --- p.58
Chapter 6.1 --- Linearization of Nonlinear Systems --- p.58
Chapter 6.2 --- Extended Kalman Filter --- p.59
Chapter Chapter 7 --- Unscented Kalman Filter --- p.61
Chapter 7.1 --- Least-square Estimator Structure --- p.61
Chapter 7.2 --- Unscented Transform --- p.62
Chapter 7.3 --- Unscented Kalman Filter --- p.64
Chapter Chapter 8 --- Data Fusion Algorithm --- p.68
Chapter 8.1 --- Traditional Multi-Sensor Data Fusion --- p.69
Chapter 8.1.1 --- Measurement Fusion --- p.69
Chapter 8.1.2 --- Track-to-Track Fusion --- p.71
Chapter 8.2 --- Multi-Sensor Data Fusion using Extended Kalman Filter --- p.72
Chapter 8.2.1 --- Time Update Model --- p.73
Chapter 8.2.2 --- Measurement Update Model --- p.74
Chapter 8.3 --- Multi-Sensor Data Fusion using Unscented Kalman Filter --- p.75
Chapter 8.3.1 --- Time Update Model --- p.75
Chapter 8.3.2 --- Measurement Update Model --- p.76
Chapter 8.4 --- Simulation Test --- p.76
Chapter 8.5 --- Experimental Test --- p.80
Chapter 8.5.1 --- Rotational Test --- p.81
Chapter 8.5.2 --- Translational Test --- p.86
Chapter Chapter 9 --- Future Work --- p.93
Chapter 9.1 --- Zero Velocity Compensation --- p.93
Chapter 9.1.1 --- Stroke Segmentation --- p.93
Chapter 9.1.2 --- Zero Velocity Compensation (ZVC) --- p.94
Chapter 9.1.3 --- Experimental Results --- p.94
Chapter 9.2 --- Random Sample Consensus Algorithm (RANSAC) --- p.96
Chapter Chapter 10 --- Conclusion --- p.97
Bibliography --- p.99
Стилі APA, Harvard, Vancouver, ISO та ін.
33

"A novel sub-pixel edge detection algorithm: with applications to super-resolution and edge sharpening." 2013. http://library.cuhk.edu.hk/record=b5884269.

Повний текст джерела
Анотація:
Lee, Hiu Fung.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2013.
Includes bibliographical references (leaves 80-82).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts also in Chinese.
Стилі APA, Harvard, Vancouver, ISO та ін.
34

"Detecting irregularity in videos using spatiotemporal volumes." 2007. http://library.cuhk.edu.hk/record=b5893341.

Повний текст джерела
Анотація:
Li, Yun.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.
Includes bibliographical references (leaves 68-72).
Abstracts in English and Chinese.
Abstract --- p.I
摘要 --- p.III
Acknowledgments --- p.IV
List of Contents --- p.VI
List of Figures --- p.VII
Chapter Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Visual Detection --- p.2
Chapter 1.2 --- Irregularity Detection --- p.4
Chapter Chapter 2 --- System Overview --- p.7
Chapter 2.1 --- Definition of Irregularity --- p.7
Chapter 2.2 --- Contributions --- p.8
Chapter 2.3 --- Review of previous work --- p.9
Chapter 2.3.1 --- Model-based Methods --- p.9
Chapter 2.3.2 --- Statistical Methods --- p.11
Chapter 2.4 --- System Outline --- p.14
Chapter Chapter 3 --- Background Subtraction --- p.16
Chapter 3.1 --- Related Work --- p.17
Chapter 3.2 --- Adaptive Mixture Model --- p.18
Chapter 3.2.1 --- Online Model Update --- p.20
Chapter 3.2.2 --- Background Model Estimation --- p.22
Chapter 3.2.3 --- Foreground Segmentation --- p.24
Chapter Chapter 4 --- Feature Extraction --- p.28
Chapter 4.1 --- Various Feature Descriptors --- p.29
Chapter 4.2 --- Histogram of Oriented Gradients --- p.30
Chapter 4.2.1 --- Feature Descriptor --- p.31
Chapter 4.2.2 --- Feature Merits --- p.33
Chapter 4.3 --- Subspace Analysis --- p.35
Chapter 4.3.1 --- Principal Component Analysis --- p.35
Chapter 4.3.2 --- Subspace Projection --- p.37
Chapter Chapter 5 --- Bayesian Probabilistic Inference --- p.39
Chapter 5.1 --- Estimation of PDFs --- p.40
Chapter 5.1.1 --- K-Means Clustering --- p.40
Chapter 5.1.2 --- Kernel Density Estimation --- p.42
Chapter 5.2 --- MAP Estimation --- p.44
Chapter 5.2.1 --- ML Estimation & MAP Estimation --- p.44
Chapter 5.2.2 --- Detection through MAP --- p.46
Chapter 5.3 --- Efficient Implementation --- p.47
Chapter 5.3.1 --- K-D Trees --- p.48
Chapter 5.3.2 --- Nearest Neighbor (NN) Algorithm --- p.49
Chapter Chapter 6 --- Experiments and Conclusion --- p.51
Chapter 6.1 --- Experiments --- p.51
Chapter 6.1.1 --- Outdoor Video Surveillance - Exp. 1 --- p.52
Chapter 6.1.2 --- Outdoor Video Surveillance - Exp. 2 --- p.54
Chapter 6.1.3 --- Outdoor Video Surveillance - Exp. 3 --- p.56
Chapter 6.1.4 --- Classroom Monitoring - Exp.4 --- p.61
Chapter 6.2 --- Algorithm Evaluation --- p.64
Chapter 6.3 --- Conclusion --- p.66
Bibliography --- p.68
Стилі APA, Harvard, Vancouver, ISO та ін.
35

"Robust stereo motion and structure estimation scheme." Thesis, 2006. http://library.cuhk.edu.hk/record=b6074304.

Повний текст джерела
Анотація:
Another important contribution of this thesis is that we propose another novel and highly robust estimator: Kernel Density Estimation Sample Consensus (KDESAC) which employs Random Sample Consensus algorithm combined with Kernel Density Estimation (KDE). The main advantage of KDESAC is that no prior information and no scale estimators are required in the estimation of the parameters. The computational load of KDESAC is much lower than the robust algorithms which estimate the scale in every sample loop. The experiments on synthetic data show that the proposed method is more robust to the heavily corrupted data than other algorithms. KDESAC can tolerate more than 80% outliers and multiple structures. Although Adaptive Scale Sample Consensus (ASSC) can obtain such good performance as KDESAC, ASSC is much slower than KDESAC. KDESAC is also applied to SFM problem and multi-motion estimation with real data. The experiments demonstrate that KDESAC is robust and efficient.
Structure from motion (SFM), the problem of estimating 3D structure from 2D images hereof, is one of the most popular and well studied problems within computer vision. This thesis is a study within the area of SFM. The main objective of this work is to improve the robustness of the SFM algorithm so as to make it capable of tolerating a great number of outliers in the correspondences. For improving the robustness, a stereo image sequence is processed, so the random sampling algorithms can be employed in the structure and motion estimation. With this strategy, we employ Random Sample Consensus (RANSAC) in motion and structure estimation to exclude outliers. Since the RANSAC method needs the prior information about the scale of the inliers, we proposed an auto-scale RANSAC algorithm which determines the inliers by analyzing the probability density of the residuals. The experimental results demonstrate that SFM by the proposed auto-scale RANSAC is more robust and accurate than that by RANSAC.
Chan Tai.
"September 2006."
Adviser: Yun Hui Liu.
Source: Dissertation Abstracts International, Volume: 68-03, Section: B, page: 1716.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2006.
Includes bibliographical references (p. 113-120).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts in English and Chinese.
School code: 1307.
Стилі APA, Harvard, Vancouver, ISO та ін.
36

Sun, Jun. "Efficient computation of MRF for low-level vision problems." Master's thesis, 2012. http://hdl.handle.net/1885/156022.

Повний текст джерела
Анотація:
Low-level computer vision problems, such as image restoration, stereo matching and image segmentation, have been given sufficient attention by computer vision researchers. These problems, though appear to be distinct, share a common structure: given the observed image data, estimating an hidden label at each pixel position. Due to this similarity, one effective solution is to deal with the low-level vision problems by a unified approach: Markov Random Fields (MRFs) framework, which includes MRFs modeling and inference. To be specific, the relationship between observed image data and hidden labels are modeled by MRFs network, formulating a joint distribution. Inference allows to efficiently find a local optimum of the distribution and producing the estimation of underlying labels. We study how to efficiently solve low-level vision problems by MRFs framework. To achieve this target, we mainly focus on two aspects: (1) optimizing the MRFs structure; (2) improving the efficiency of inference. MRFs structures express how we describe the relationship between observed data and hidden labels, as well as the relationship among hidden label themselves. We aim to optimize the MRFs structure for fast inference. Besides the structure, there are two important terms used in inference, one is called data-term and another is called prior-term. In this work, we also study generating a reliable and robust data-term to increase the accuracy and efficiency of inference. In this thesis, a multi-spanning-tree decomposition work is firstly proposed. The traditional 4-connected MRF is broken down to a set of spanning trees, which are loopy-free. Edges in the original grid are uniformly distributed in these spanning trees. Using the multi-spanning-tree structure, inference can be performed fast and parallel; inference results can be combined by an efficient median filter. In the second place, a deeper analysis on spanning tree MRFs is proposed. Tree structure is popularly utilized for inference, but how to select an optimal spanning tree is widely overlooked by researchers. The problems are formulated as finding an optimal spanning tree to approximate 4-connected grid and solved by minimizing the KL-divergence between tree and grid distribution. It is also demonstrated that for different low-level vision problems, the selection criterion of optimal spanning trees are distinct. Besides the MRFs structure analysis, we also optimize the data-term generation process. This work is done specialized for image denoising. We use a non-local technique to create a reliable and low dimensional label space, generating the data-term. This data-term helps to provide comparable results with state-of-art algorithims in a quite limited time. To sum up, we efficiently solve low-level vision problems by MRFs inference. Although MRFs based low-level vision problems have been analyzed fora long time, there are still some interesting work being overlooked. After a careful scrutiny on the literature, we extend existing ideas and develop new algorithms, empirically demonstrate the MRFs computation can be improved both in timing and accuracy.
Стилі APA, Harvard, Vancouver, ISO та ін.
37

"Calculating degenerate structures via convex optimization with applications in computer vision and pattern recognition." 2012. http://library.cuhk.edu.hk/record=b5549425.

Повний текст джерела
Анотація:
在諸多電腦視覺和模式識別的問題中,採集到的圖像和視頻資料通常是高維的。直接計算這些高維資料常常面臨計算可行性和穩定性等方面的困難。然而,現實世界中的資料通常由少數物理因素產生,因而本質上存在退化的結構。例如,它們可以用子空間、子空間的集合、流形或者分層流形等模型來描述。計算並運用這些內在退化結構不僅有助於深入理解問題的本質,而且能夠幫助解決實際應用中的難題。
隨著近些年凸優化理論和應用的發展,一些NP難題諸如低稚矩陣的計算和稀疏表示的問題已經有了近乎完美和高效的求解方法。本論文旨在研究如何應用這些技術來計算高維資料中的退化結構,並著重研究子空間和子空間的集合這兩種結構,以及它們在現實應用方面的意義。這些應用包括:人臉圖像的配准、背景分離以及自動植物辨別。
在人臉圖像配准的問題中,同一人臉在不同光照下的面部圖像經過逐圖元配准後應位於一個低維的子空間中。基於此假設,我們提出了一個新的圖像配准方法,能夠對某未知人臉的多副不同光照、表情和姿態下的圖像進行聯合配准,使得每一幅面部圖像的圖元與事先訓練的一般人臉模型相匹配。其基本思想是追尋一個低維的且位於一般人臉子空間附近的仿射子空間。相比于傳統的基於外觀模型的配准方法(例如主動外觀模型)依賴于準確的外觀模型的缺點,我們提出的方法僅需要一個一般人臉模型就可以很好地對該未知人臉的多副圖像進行聯合配准,即使該人臉與訓練該模型的樣本相差很大。實驗結果表明,該方法的配准精度在某些情況下接近于理想情形,即:當該目標人臉的模型事先已知時,傳統方法所能夠達到的配准精度。
In a wide range of computer vision and pattern recognition problems, the captured images and videos often live in high-dimensional observation spaces. Directly computing them may suffer from computational infeasibility and numerical instability. On the other hand, the data in the real world are often generated due to limited number of physical causes, and thus embed degenerate structures in the nature. For instance, they can be modeled by a low-dimensional subspace, a union of subspaces, a manifold or even a manifold stratification. Discovering and harnessing such intrinsic structures not only brings semantic insight into the problems at hand, but also provides critical information to overcome challenges encountered in the practice.
Recent years have witnessed great development in both the theory and application of convex optimization. Efficient and elegant solutions have been found for NP-hard problems such as low-rank matrix recovery and sparse representation. In this thesis, we study the problem of discovering degenerate structures of high-¬dimensional inputs using these techniques. Especially we focus ourselves on low-dimensional subspaces and their unions, and address their application in overcoming the challenges encoun-tered under three practical scenarios: face image alignment, background subtraction and automatic plant identification.
In facial image alignment, we propose a method that jointly brings multiple images of an unseen face into alignment with a pre-trained generic appearance model despite different poses, expressions and illumination conditions of the face in the images. The idea is to pursue an intrinsic affine subspace of the target face that is low-dimensional while at the same time lies close to the generic subspace. Compared with conventional appearance-based methods that rely on accurate appearance mod-els, ours works well with only a generic one and performs much better on unseen faces even if they significantly differ from those for training the generic model. The result is approximately good as that in an idealistic case where a specific model for the target face is provided.
For background subtraction, we propose a background model that captures the changes caused by the background switching among a few configurations, like traffic lights statuses. The background is modeled as a union of low-dimensional subspaces, each characterizing one configuration of the background, and the proposed algorithm automatically switches among them and identifies violating elements as foreground pixels. Moreover, we propose a robust learning approach that can work with foreground-present training samples at the background modeling stage it builds a correct background model with outlying foreground pixels automatically pruned out. This is practically important when foreground-free training samples are difficult to obtain in scenarios such as traffic monitoring.
For automatic plant identification, we propose a novel and practical method that recognizes plants based on leaf shapes extracted from photographs. Different from existing studies that are mostly focused on simple leaves, the proposed method is de-signed to recognize both simple and compound leaves. The key to that is, instead of either measuring geometric features or matching shape features as in conventional methods, we describe leaves by counting on them the numbers of certain shape patterns. The patterns are learned in a way that they form a degenerate polytope (a spe-cial union of affine subspaces) in the feature space, and can simulate, to some extent, the "keys" used by botanists - each pattern reflects a common feature of several dif-ferent species and all the patterns together can form a discriminative rule for recog-nition. Experiments conducted on a variety of datasets show that our algorithm sig-nificantly outperforms the state-of-art methods in terms of recognition accuracy, ef-ficiency and storage, and thus has a good promise for practicing.
In conclusion, our performed studies show that: 1) the visual data with semantic meanings are often not random - although they can be high-dimensional, they typically embed degenerate structures in the observation space. 2) With appropriate assumptions made and clever computational tools developed, these structures can be efficiently and stably calculated. 3) The employment of these intrinsic structures helps overcoming practical challenges and is critical for computer vision and pattern recognition algorithms to achieve good performance.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
在背景分離的問題中,靜態場景在不同光照情形下的背景可以被描述為一個線性子空間。然而在實際應用中,背景的局部和突然的變化有可能違背此假設,尤其是當背景在幾個狀態之間切換的情形下,例如交通燈在不同組合狀態之間切換。為了解決該問題,本論文中提出了一個新的背景模型,它將背景描述為一些子空間的集合,每個子空間對應一個背景狀態。我們將背景分離的問題轉化為稀疏逼近的問題,因此演算法能夠自動在多個狀態中切換並成功檢測出前景物體。此外,本論文提出了一個魯棒的字典學習方法。在訓練背景模型的過程中,它能夠處理含有前景物體的圖像,並在訓練過程中自動將前景部分去掉。這個優點在難以收集完整背景訓練樣本的應用情形(譬如交通監視等)下有明顯的優勢。
在植物種類自動辨別的問題中,本論文中提出了一個新的有效方法,它通過提取和對比植物葉片的輪廓對植物進行識別和分類。不同于傳統的基於測量幾何特徵或者在形狀特徵之間配對的方法,我們提出使用葉子上某些外形模式的數量來表達樹葉。這些模式在特徵空間中形成一個退化的多面體結構(一種特殊的仿射空間的集合),而且在某種程度上能夠類比植物學中使用的分類檢索表每個模式都反映了一些不同植物的某個共性,例如某種邊緣、某種形狀、某種子葉的佈局等等;而所有模式組合在一起能夠形成具有很高區分度的分類準則。通過對演算法在四個數據庫上的測試,我們發現本論文提出的方法無論在識別精度還是在效率和存儲方面都相比于目前主流方法有顯著提高,因此具有很好的應用性。
總之,我們進行的一些列研究說明:(1) 有意義的視覺資料通常是內在相關的,儘管它們的維度可能很高,但是它們通常都具有某種退化的結構。(2) 合理的假設和運用計算工具可以高效、穩健地發現這些結構。(3) 利用這些結構有助於解決實際應用中的難題,且能夠使得電腦視覺和模式識別演算法達到好的性能。
Zhao, Cong.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2012.
Includes bibliographical references (leaves 107-121).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
Dedication --- p.i
Acknowledgements --- p.ii
Abstract --- p.v
Abstract (in Chinese) --- p.viii
Publication List --- p.xi
Nomenclature --- p.xii
Contents --- p.xiv
List of Figures --- p.xviii
Chapter Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Motivation --- p.1
Chapter 1.2 --- Background --- p.2
Chapter 1.2.1 --- Subspaces --- p.3
Chapter 1.2.2 --- Unions of Subspaces --- p.6
Chapter 1.2.3 --- Manifolds and Stratifications --- p.8
Chapter 1.3 --- Thesis Outline --- p.10
Chapter Chapter 2 --- Joint Face Image Alignment --- p.13
Chapter 2.1 --- Introduction --- p.14
Chapter 2.2 --- Related Works --- p.16
Chapter 2.3 --- Background --- p.18
Chapter 2.3.1 --- Active Appearance Model --- p.18
Chapter 2.3.2 --- Multi-Image Alignment using AAM --- p.20
Chapter 2.3.3 --- Limitations in Practice --- p.21
Chapter 2.4 --- The Proposed Method --- p.23
Chapter 2.4.1 --- Two Important Assumptions --- p.23
Chapter 2.4.2 --- The Subspace Pursuit Problem --- p.27
Chapter 2.4.3 --- Reformulation --- p.27
Chapter 2.4.4 --- Efficient Solution --- p.30
Chapter 2.4.5 --- Discussions --- p.32
Chapter 2.5 --- Experiments --- p.34
Chapter 2.5.1 --- Settings --- p.34
Chapter 2.5.2 --- Results and Discussions --- p.36
Chapter 2.6 --- Summary --- p.38
Chapter Chapter 3 --- Background Subtraction --- p.40
Chapter 3.1 --- Introduction --- p.41
Chapter 3.2 --- Related Works --- p.43
Chapter 3.3 --- The Proposed Method --- p.48
Chapter 3.3.1 --- Background Modeling --- p.48
Chapter 3.3.2 --- Background Subtraction --- p.49
Chapter 3.3.3 --- Foreground Object Detection --- p.52
Chapter 3.3.4 --- Background Modeling by Dictionary Learning --- p.53
Chapter 3.4 --- Robust Dictionary Learning --- p.54
Chapter 3.4.1 --- Robust Sparse Coding --- p.56
Chapter 3.4.2 --- Robust Dictionary Update --- p.57
Chapter 3.5 --- Experimentation --- p.59
Chapter 3.5.1 --- Local and Sudden Changes --- p.59
Chapter 3.5.2 --- Non-structured High-frequency Changes --- p.62
Chapter 3.5.3 --- Discussions --- p.65
Chapter 3.6 --- Summary --- p.66
Chapter Chapter 4 --- Plant Identification using Leaves --- p.67
Chapter 4.1 --- Introduction --- p.68
Chapter 4.2 --- Related Works --- p.70
Chapter 4.3 --- Review of IDSC Feature --- p.71
Chapter 4.4 --- The Proposed Method --- p.73
Chapter 4.4.1 --- Independent-IDSC Feature --- p.75
Chapter 4.4.2 --- Common Shape Patterns --- p.77
Chapter 4.4.3 --- Leaf Representation by Counts --- p.80
Chapter 4.4.4 --- Leaf Recognition by NN Classifier --- p.82
Chapter 4.5 --- Experiments --- p.82
Chapter 4.5.1 --- Settings --- p.82
Chapter 4.5.2 --- Performance --- p.83
Chapter 4.5.3 --- Shared Dictionaries v.s. Shared Features --- p.88
Chapter 4.5.4 --- Pooling --- p.89
Chapter 4.6 --- Discussions --- p.90
Chapter 4.6.1 --- Time Complexity --- p.90
Chapter 4.6.2 --- Space Complexity --- p.91
Chapter 4.6.3 --- System Description --- p.92
Chapter 4.7 --- Summary --- p.92
Chapter 4.8 --- Acknowledgement --- p.94
Chapter Chapter 5 --- Conclusion and Future Work --- p.95
Chapter 5.1 --- Thesis Contributions --- p.95
Chapter 5.2 --- Future Work --- p.97
Chapter 5.2.1 --- Theory Side --- p.98
Chapter 5.2.2 --- Practice Side --- p.98
Chapter Appendix-I --- Joint Face Alignment Results --- p.100
Bibliography --- p.107
Стилі APA, Harvard, Vancouver, ISO та ін.
38

Mathebula, Solani David. "Quantitative analysis of the linear optical character of the anterior segment of the eye." Thesis, 2014. http://hdl.handle.net/10210/8950.

Повний текст джерела
Анотація:
D.Phil. (Optometry)
An important issue in the quantitative analysis of optical systems is, for example, the question of how to calculate an average of a set of eyes. An average that also has an optical character as a whole and is representative or central to the optical characters of the eyes within that set of eyes. In the case of refraction, an average power is readily calculated as the arithmetic average of several dioptric power matrices. The question then is: How does one determine an average that represents the average optical character of a set of eyes, completely to first order? The exponential-mean-log transference has been proposed by Harris as the most promising solution to the question of the average eye. For such an average to be useful, it is necessary that the exponential-mean-log-transference satisfies conditions of existence, uniqueness and symplecticity, The first-order optical nature of a centred optical system (or eye) is completely characterized by the 4x4 ray transference. The augmented ray transference can be represented as a 5x5 matrix and is usually partitioned into 2x2 and 2x 1 submatrices. They are the dilation A, disjugacy B, divergence C, divarication D, transverse translation e and deflection 1t. These are the six fundamental first-orders optical properties of the system. Other optical properties, called derived properties, of the system can be obtained from them. Excluding decentred or tilted elements, everything that can happen to a ray is described by a 4x4 system matrix. The transference, then, defines the four A, B, C and D fundamental optical properties of the system…
Стилі APA, Harvard, Vancouver, ISO та ін.
39

Nguyen, Quang Anh. "Advanced methods and extensions for kernel-based object tracking." Phd thesis, 2010. http://hdl.handle.net/1885/150670.

Повний текст джерела
Анотація:
In today's world, the rapid developments in computing technology have generated a great deal of interest in automated video analysis systems such as smart cars, video indexing services and advanced surveillance networks. Amongst those, object tracking research plays a pivotal role in providing an unobtrusive means to detect, track and alert the presence of the objects of interest in a given scene with little to no supervisor interaction. The application fields for object tracking can vary from smart vehicles, advanced surveillance networks to sport analysis and perceptual user interfaces. The diversity in its applications also gives rise to a number of different tracking algorithms tailored to suit the corresponding scenarios and constraints. Along this line, the kernel-based tracker has emerged as one of the benchmark tracking algorithms due to its real-time performance, robustness to noise and tracking accuracy. In this thesis, we explore the possibility of further enhancing the original kernel-based tracker. We do this by firstly developing a probabilistic formulation for the mean-shift algorithm which, in turn, provides a means to estimate the target's severe transformations in size, shape and orientation. For changes in colour appearance due to poor lighting condition of the scene, we opt to combine multiple low complexity image features such as texture, contrast, brightness and colour to improve the tracking performance. To achieve this, we advocate the use of a graphical model to abstract the image features under study into a relational structure and, subsequently, make use of graph-spectral methods to combine them linearly and in a straight-forward manner. Furthermore, we also present an on-line updating method to adjust the tracking model to incorporate new changes in the image features during the course of tracking. To cope with the problem of object occlusion in a high density traffic area, we propose a geometric method to extend the mean-shift algorithm to a 3D setting by the use of multiple cameras with overlapped views of the scene. The methods presented in this thesis not only show significant performance improvements on real-world sequences over a number of benchmark algorithms, but also encompass high generalisation in the spatial and feature domains for future development purposes.
Стилі APA, Harvard, Vancouver, ISO та ін.
40

Tao, Trevor. "An extended Mumford-Shah model and improved region merging algorithm for image segmentation." Thesis, 2005. http://hdl.handle.net/2440/37749.

Повний текст джерела
Анотація:
In this thesis we extend the Mumford-Shah model and propose a new region merging algorithm for image segmentation. The segmentation problem is to determine an optimal partition of an image into constituent regions such that individual regions are homogenous within and adjacent regions have contrasting properties. By optimimal, we mean one that minimizes a particular energy functional. In region merging, the image is initially divided into a very fine grid, with each pixel being a separate region. Regions are then recursively merged until it is no longer possible to decrease the energy functional. In 1994, Koepfler, Lopez and Morel developed a region merging algorithm for segmentating an image. They consider the piecewise constant Mumford-Shah model, where the energy functional consists of two terms, accuracy versus complexity, with the trade - off controlled by a scale parameter. They show that one can efficiently generate a hierarchy of segmentations from coarse to fine. This algorithm is complemented by a sound theoretical analysis of the piecewise constant model, due to Morel and Solimini. The primary motivation for extending the Mumford-Shah model stems from the fact that this model is only suitable for " cartoon " images, where each region is uncomtaminated by any form of noise. Other shortcomings also need to be addressed. In the algorithm of Koepfler et al., it is difficult to determine the order in which the regions are merged and a " schedule " is required in order to determine the number and fine - ness of segmentations in the hierarchy. Both of these difficulties mitigate the theoretical analysis of Koepfler ' s algorithm. There is no definite method for selecting the " optimal " value of the scale parameter itself. Furthermore, the mathematical analysis is not well understood for more complex models. None of these issues are convincingly answered in the literature. This thesis aims to provide some answers to the above shortcomings by introducing new techniques for region merging algorithms and a better understanding of the theoretical analysis of both the mathematics and the algorithm ' s performance. A review of general segmentation techniques is provided early in this thesis. Also discussed is the development of an " extended " model to account for white noise contamination of images, and an improvement of Koepfler ' s original algorithm which eliminates the need for a schedule. The work of Morel and Solimini is generalized to the extended model. Also considered is an application to textured images and the issue of selecting the value of the scale parameter.
Thesis (Ph.D.)--School of Mathematical Sciences, 2005.
Стилі APA, Harvard, Vancouver, ISO та ін.
41

Zareian, Alireza. "Learning Structured Representations for Understanding Visual and Multimedia Data." Thesis, 2021. https://doi.org/10.7916/d8-94j1-yb14.

Повний текст джерела
Анотація:
Recent advances in Deep Learning (DL) have achieved impressive performance in a variety of Computer Vision (CV) tasks, leading to an exciting wave of academic and industrial efforts to develop Artificial Intelligence (AI) facilities for every aspect of human life. Nevertheless, there are inherent limitations in the understanding ability of DL models, which limit the potential of AI in real-world applications, especially in the face of complex, multimedia input. Despite tremendous progress in solving basic CV tasks, such as object detection and action recognition, state-of-the-art CV models can merely extract a partial summary of visual content, which lacks a comprehensive understanding of what happens in the scene. This is partly due to the oversimplified definition of CV tasks, which often ignore the compositional nature of semantics and scene structure. It is even less studied how to understand the content of multiple modalities, which requires processing visual and textual information in a holistic and coordinated manner, and extracting interconnected structures despite the semantic gap between the two modalities. In this thesis, we argue that a key to improve the understanding capacity of DL models in visual and multimedia domains is to use structured, graph-based representations, to extract and convey semantic information more comprehensively. To this end, we explore a variety of ideas to define more realistic DL tasks in both visual and multimedia domains, and propose novel methods to solve those tasks by addressing several fundamental challenges, such as weak supervision, discovery and incorporation of commonsense knowledge, and scaling up vocabulary. More specifically, inspired by the rich literature of semantic graphs in Natural Language Processing (NLP), we explore innovative scene understanding tasks and methods that describe images using semantic graphs, which reflect the scene structure and interactions between objects. In the first part of this thesis, we present progress towards such graph-based scene understanding solutions, which are more accurate, need less supervision, and have more human-like common sense compared to the state of the art. In the second part of this thesis, we extend our results on graph-based scene understanding to the multimedia domain, by incorporating the recent advances in NLP and CV, and developing a new task and method from the ground up, specialized for joint information extraction in the multimedia domain. We address the inherent semantic gap between visual content and text by creating high-level graph-based representations of images, and developing a multitask learning framework to establish a common, structured semantic space for representing both modalities. In the third part of this thesis, we explore another extension of our scene understanding methodology, to open-vocabulary settings, in order to make scene understanding methods more scalable and versatile. We develop visually grounded language models that use naturally supervised data to learn the meaning of all words, and transfer that knowledge to CV tasks such as object detection with little supervision. Collectively, the proposed solutions and empirical results set a new state of the art for the semantic comprehension of visual and multimedia content in a structured way, in terms of accuracy, efficiency, scalability, and robustness.
Стилі APA, Harvard, Vancouver, ISO та ін.
42

Tao, Trevor. "An extended Mumford-Shah model and an improved region merging algorithm for image segmentation." 2005. http://hdl.handle.net/2440/37749.

Повний текст джерела
Анотація:
In this thesis we extend the Mumford-Shah model and propose a new region merging algorithm for image segmentation. The segmentation problem is to determine an optimal partition of an image into constituent regions such that individual regions are homogenous within and adjacent regions have contrasting properties. By optimimal, we mean one that minimizes a particular energy functional. In region merging, the image is initially divided into a very fine grid, with each pixel being a separate region. Regions are then recursively merged until it is no longer possible to decrease the energy functional. In 1994, Koepfler, Lopez and Morel developed a region merging algorithm for segmentating an image. They consider the piecewise constant Mumford-Shah model, where the energy functional consists of two terms, accuracy versus complexity, with the trade - off controlled by a scale parameter. They show that one can efficiently generate a hierarchy of segmentations from coarse to fine. This algorithm is complemented by a sound theoretical analysis of the piecewise constant model, due to Morel and Solimini. The primary motivation for extending the Mumford-Shah model stems from the fact that this model is only suitable for " cartoon " images, where each region is uncomtaminated by any form of noise. Other shortcomings also need to be addressed. In the algorithm of Koepfler et al., it is difficult to determine the order in which the regions are merged and a " schedule " is required in order to determine the number and fine - ness of segmentations in the hierarchy. Both of these difficulties mitigate the theoretical analysis of Koepfler ' s algorithm. There is no definite method for selecting the " optimal " value of the scale parameter itself. Furthermore, the mathematical analysis is not well understood for more complex models. None of these issues are convincingly answered in the literature. This thesis aims to provide some answers to the above shortcomings by introducing new techniques for region merging algorithms and a better understanding of the theoretical analysis of both the mathematics and the algorithm ' s performance. A review of general segmentation techniques is provided early in this thesis. Also discussed is the development of an " extended " model to account for white noise contamination of images, and an improvement of Koepfler ' s original algorithm which eliminates the need for a schedule. The work of Morel and Solimini is generalized to the extended model. Also considered is an application to textured images and the issue of selecting the value of the scale parameter.
Thesis (Ph.D.)--School of Mathematical Sciences, 2005.
Стилі APA, Harvard, Vancouver, ISO та ін.
43

Mukundan, R. "Image Based Attitude And Position Estimation Using Moment Functions." Thesis, 1995. http://etd.iisc.ernet.in/handle/2005/1733.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
44

Saragih, Jason Mora. "The generative learning and discriminative fitting of linear deformable models." Phd thesis, 2008. http://hdl.handle.net/1885/146528.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
45

"Modeling and rendering from multiple views." Thesis, 2006. http://library.cuhk.edu.hk/record=b6074293.

Повний текст джерела
Анотація:
The first approach, described in the first part of this thesis, studies 3D face modeling from multi-views. Today human face modeling and animation techniques are widely used to generate virtual characters and models. Such characters and models are used in movies, computer games, advertising, news broadcasting and other activities. We propose an efficient method to estimate the poses, the global shape and the local structures of a human head recorded in multiple face images or a video sequence by using a generic wireframe face model. Based on this newly proposed method, we have successfully developed a pose invariant face recognition system and a pose invariant face contour extraction method.
The objective of this thesis is to model and render complex scenes or objects from multiple images taken from different viewpoints. Two approaches to achieve this objective were investigated in this thesis. The first one is for known objects with prior geometrical models, which can be deformed to match the objects recorded in multiple input images. The second one is for general scenes or objects without prior geometrical models.
The proposed algorithms in this thesis were tested on many real and synthetic data. The experimental results illustrate their efficiency and limitations.
The second approach, described in the second part of this thesis, investigates 3D modeling and rendering for general complex scenes. The entertainment industry touches hundreds of millions of people every day, and synthetic pictures and 3D reconstruction of real scenes, often mixed with actual film footage, are now common place in computer games, sports broadcasting, TV advertising and feature films. A series of techniques has been developed to complete this task. First, a new view-ordering algorithm was proposed to organize and order an unorganized image database. Second, a novel and efficient multiview feature matching approach was developed to calibrate and track all views. Finally, both match propagation based and Bayesian based methods were developed to produce 3D scene models for rendering.
Yao Jian.
"September 2006."
Adviser: Wai-Kuen Chan.
Source: Dissertation Abstracts International, Volume: 68-03, Section: B, page: 1849.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2006.
Includes bibliographical references (p. 170-181).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts in English and Chinese.
School code: 1307.
Стилі APA, Harvard, Vancouver, ISO та ін.
46

"Inter-modality image synthesis and recognition." 2012. http://library.cuhk.edu.hk/record=b5549527.

Повний текст джерела
Анотація:
跨模態圖像的合成和識別已成為計算機視覺領域的熱點。實際應用中存在各種各樣的圖像模態,比如刑偵中使用的素描畫和光照不變人臉識別中使用的近紅外圖像。由於某些模態的圖像很難獲得,模態間的轉換和匹配是一項十分有用的技術,為計算機視覺的應用提供了很大的便利。
本論文研究了三個應用:人像素描畫的合成,基於樣本的圖像風格化和人像素描畫識別。
我們將人像素描畫的合成的前沿研究擴展到非可控條件下的合成。以前的工作都只能在嚴格可控的條件下從照片合成素描畫。我們提出了一種魯棒的算法,可以從有光照和姿態變化的人臉照片合成素描畫。該算法用多尺度馬爾可夫隨機場來合成局部素描圖像塊。對光照和姿態的魯棒性通過三個部分來實現:基於面部器官的形狀先驗可以抑制缺陷和扭曲的合成效果,圖像塊的特征描述子和魯棒的距離測度用來選擇素描圖像塊,以及像素灰度和梯度的一致性來有效地匹配鄰近的素描圖像塊。在CUHK人像素描數據庫和網上的名人照片上的實驗結果表明我們的算法顯著提高了現有算法的效果。
針對基於樣本的圖像風格化,我們提供了一種將模板圖像的藝術風格傳遞到照片上的有效方法。大多數已有方法沒有考慮圖像內容和風格的分離。我們提出了一種通過頻段分解的風格傳遞算法。一幅圖像被分解成低頻、中頻和高頻分量,分別描述內容、主要風格和邊緣信息。接著中頻和高頻分量中的風格從模板傳遞到照片,這一過程用馬爾可夫隨機場來建模。最後我們結合照片中的低頻分量和獲得的風格信息重建出藝術圖像。和其它算法相比,我們的方法不僅合成了風格,而且很好的保持了原有的圖像內容。我們通過圖像風格化和個性化藝術合成的實驗來驗證了算法的有效性。
我們為人像素描畫的識別提出了一個從數據中學習人臉描述子的新方向。最近的研究都集中在轉換照片和素描畫到相同的模態,或者設計復雜的分類算法來減少從照片和素描畫提取的特征的模態間差異。我們提出了一種新穎的方法:在提取特征的階段減小模態間差異。我們用一種基於耦合信息論編碼的人臉描述子來獲取有判別性的局部人臉結構和有效的匹配照片和素描畫。通過最大化在量化特征空間的照片和素描畫的互信息,我們設計了耦合信息論投影森林來實現耦合編碼。在世界上最大的人像素描畫數據庫上的結果表明我們的方法和已有最好的方法相比有顯著提高。
Inter-modality image synthesis and recognition has been a hot topic in computer vision. In real-world applications, there are diverse image modalities, such as sketch images for law enforcement and near infrared images for illumination invariant face recognition. Therefore, it is often useful to transform images from a modality to another or match images from different modalities, due to the difficulty of acquiring image data in some modality. These techniques provide large flexibility for computer vision applications.
In this thesis we study three problems: face sketch synthesis, example-based image stylization, and face sketch recognition.
For face sketch synthesis, we expand the frontier to synthesis from uncontrolled face photos. Previous methods only work under well controlled conditions. We propose a robust algorithm for synthesizing a face sketch from a face photo with lighting and pose variations. It synthesizes local sketch patches using a multiscale Markov Random Field (MRF) model. The robustness to lighting and pose variations is achieved with three components: shape priors specific to facial components to reduce artifacts and distortions, patch descriptors and robust metrics for selecting sketch patch candidates, and intensity compatibility and gradient compatibility to match neighboring sketch patches effectively. Experiments on the CUHK face sketch database and celebrity photos collected from the web show that our algorithm significantly improves the performance of the state-of-the-art.
For example-based image stylization, we provide an effective approach of transferring artistic effects from a template image to photos. Most existing methods do not consider the content and style separately. We propose a style transfer algorithm via frequency band decomposition. An image is decomposed into the low-frequency (LF), mid-frequency (MF), and highfrequency( HF) components, which describe the content, main style, and information along the boundaries. Then the style is transferred from the template to the photo in the MF and HF components, which is formulated as MRF optimization. Finally a reconstruction step combines the LF component of the photo and the obtained style information to generate the artistic result. Compared to the other algorithms, our method not only synthesizes the style, but also preserves the image content well. We demonstrate that our approach performs excellently in image stylization and personalized artwork in experiments.
For face sketch recognition, we propose a new direction based on learning face descriptors from data. Recent research has focused on transforming photos and sketches into the same modality for matching or developing advanced classification algorithms to reduce the modality gap between features extracted from photos and sketches. We propose a novel approach by reducing the modality gap at the feature extraction stage. A face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches. Guided by maximizing the mutual information between photos and sketches in the quantized feature spaces, the coupled encoding is achieved by the proposed coupled information-theoretic projection forest. Experiments on the largest face sketch database show that our approach significantly outperforms the state-of-the-art methods.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Zhang, Wei.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2012.
Includes bibliographical references (leaves 121-137).
Abstract also in Chinese.
Abstract --- p.i
Acknowledgement --- p.v
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Multi-Modality Computer Vision --- p.1
Chapter 1.2 --- Face Sketches --- p.4
Chapter 1.2.1 --- Face Sketch Synthesis --- p.6
Chapter 1.2.2 --- Face Sketch Recognition --- p.7
Chapter 1.3 --- Example-based Image Stylization --- p.9
Chapter 1.4 --- Contributions and Summary of Approaches --- p.10
Chapter 1.5 --- Thesis Road Map --- p.13
Chapter 2 --- Literature Review --- p.14
Chapter 2.1 --- Related Works in Face Sketch Synthesis --- p.14
Chapter 2.2 --- Related Works in Example-based Image Stylization --- p.17
Chapter 2.3 --- Related Works in Face Sketch Recognition --- p.21
Chapter 3 --- Lighting and Pose Robust Sketch Synthesis --- p.27
Chapter 3.1 --- The Algorithm --- p.31
Chapter 3.1.1 --- Overview of the Method --- p.32
Chapter 3.1.2 --- Local Evidence --- p.34
Chapter 3.1.3 --- Shape Prior --- p.40
Chapter 3.1.4 --- Neighboring Compatibility --- p.42
Chapter 3.1.5 --- Implementation Details --- p.43
Chapter 3.1.6 --- Acceleration --- p.45
Chapter 3.2 --- Experimental Results --- p.47
Chapter 3.2.1 --- Lighting and Pose Variations --- p.49
Chapter 3.2.2 --- Celebrity Faces from the Web --- p.54
Chapter 3.3 --- Conclusion --- p.54
Chapter 4 --- Style Transfer via Band Decomposition --- p.58
Chapter 4.1 --- Introduction --- p.58
Chapter 4.2 --- Algorithm Overview --- p.63
Chapter 4.3 --- Image Style Transfer --- p.64
Chapter 4.3.1 --- Band Decomposition --- p.64
Chapter 4.3.2 --- MF and HF Component Processing --- p.67
Chapter 4.3.3 --- Reconstruction --- p.74
Chapter 4.4 --- Experiments --- p.76
Chapter 4.4.1 --- Comparison to State-of-the-Art --- p.76
Chapter 4.4.2 --- Extended Application: Personalized Artwork --- p.82
Chapter 4.5 --- Conclusion --- p.84
Chapter 5 --- Coupled Encoding for Sketch Recognition --- p.86
Chapter 5.1 --- Introduction --- p.86
Chapter 5.1.1 --- Related work --- p.89
Chapter 5.2 --- Information-Theoretic Projection Tree --- p.90
Chapter 5.2.1 --- Projection Tree --- p.91
Chapter 5.2.2 --- Mutual Information Maximization --- p.92
Chapter 5.2.3 --- Tree Construction with MMI --- p.94
Chapter 5.2.4 --- Randomized CITP Forest --- p.102
Chapter 5.3 --- Coupled Encoding Based Descriptor --- p.103
Chapter 5.4 --- Experiments --- p.106
Chapter 5.4.1 --- Descriptor Comparison --- p.108
Chapter 5.4.2 --- Parameter Exploration --- p.109
Chapter 5.4.3 --- Experiments on Benchmarks --- p.112
Chapter 5.5 --- Conclusions --- p.115
Chapter 6 --- Conclusion --- p.116
Bibliography --- p.121
Стилі APA, Harvard, Vancouver, ISO та ін.
47

Scoleri, Tony. "Fundamental numerical schemes for parameter estimation in computer vision." 2008. http://hdl.handle.net/2440/50726.

Повний текст джерела
Анотація:
An important research area in computer vision is parameter estimation. Given a mathematical model and a sample of image measurement data, key parameters are sought to encapsulate geometric properties of a relevant entity. An optimisation problem is often formulated in order to find these parameters. This thesis presents an elaboration of fundamental numerical algorithms for estimating parameters of multi-objective models of importance in computer vision applications. The work examines ways to solve unconstrained and constrained minimisation problems from the view points of theory, computational methods, and numerical performance. The research starts by considering a particular form of multi-equation constraint function that characterises a wide class of unconstrained optimisation tasks. Increasingly sophisticated cost functions are developed within a consistent framework, ultimately resulting in the creation of a new iterative estimation method. The scheme operates in a maximum likelihood setting and yields near-optimal estimate of the parameters. Salient features of themethod are that it has simple update rules and exhibits fast convergence. Then, to accommodate models with functional dependencies, two variant of this initial algorithm are proposed. These methods are improved again by reshaping the objective function in a way that presents the original estimation problem in a reduced form. This procedure leads to a novel algorithm with enhanced stability and convergence properties. To extend the capacity of these schemes to deal with constrained optimisation problems, several a posteriori correction techniques are proposed to impose the so-called ancillary constraints. This work culminates by giving two methods which can tackle ill-conditioned constrained functions. The combination of the previous unconstrained methods with these post-hoc correction schemes provides an array of powerful constrained algorithms. The practicality and performance of themethods are evaluated on two specific applications. One is planar homography matrix computation and the other trifocal tensor estimation. In the case of fitting a homography to image data, only the unconstrained algorithms are necessary. For the problem of estimating a trifocal tensor, significant work is done first on expressing sets of usable constraints, especially the ancillary constraints which are critical to ensure that the computed object conforms to the underlying geometry. Evidently here, the post-correction schemes must be incorporated in the computational mechanism. For both of these example problems, the performance of the unconstrained and constrained algorithms is compared to existing methods. Experiments reveal that the new methods perform with high accuracy to match a state-of-the-art technique but surpass it in execution speed.
Thesis (Ph.D.) - University of Adelaide, School of Mathemtical Sciences, Discipline of Pure Mathematics, 2008
Стилі APA, Harvard, Vancouver, ISO та ін.
48

"Motion and shape from apparent flow." 2013. http://library.cuhk.edu.hk/record=b5549772.

Повний текст джерела
Анотація:
捕捉攝像機運動和重建攝像機成像場景深度圖的測定是在計算機視覺和機器任務包括可視化控制和自主導航是非常重要。在執行上述任務時,一個攝像機(或攝像機群組)通常安裝在機器的執行端部。攝像機和執行端部之間的手眼校準在視覺控制的正常操作中是不可缺少的。同樣,在對於需要使用多個攝像機的情况下,它們的相對幾何關係也是對各種計算機視覺應用來說也是非常重要。
攝像機和場景的相對運動通常產生出optical flow。問題的困難主要在於,在直接觀察視頻中的optical flow通常不是完全由運動誘導出的optical flow,而只是它的一部分。這個部分就是空間圖像等光線輪廓的正交。這部分的流場被稱為normal flow。本論文提出直接利用normal flow,而不是由normal flow引申出的optical flow,去解決以下的問題:尋找攝像機運動,場景深度圖和手眼校準。這種方法有許多顯著的貢獻,它不需引申流場,進而不要求平滑的成像場景。跟optical flow相反,normal flow不需要複雜的優化處理程序去解決流場不連續性的問題,這種技術一般是需要用大量的計算量。這也打破了傳統攝像機運動與場景深度之間的問題,在沒有預先知道不連續位置的情況下也可找出攝像機的運動。這篇論提出了幾個直接方法運用在三種不同類型的視覺系統,分別是單個攝像機,雙攝像機和多個攝像機,去找出攝像機的運動。
本論文首先提通過Apparent Flow 正深度 (AFPD) 約束去利用所有觀察到的normal flow去找出單個攝像機的運動參數。AFPD約束是利用一個優化問題來估計運動參數。一個反复由粗到細雙重約束的投票框架能使AFPD約束尋找出運動參數。
由於有限的視頻採樣率,normal flow在提取方向比其幅度部分更準確。本論文提出了兩個約束條件:一個是Apparent Flow方向(AFD)的約束,另外一個是Apparent Flow 幅度(AFM)的約束去尋找運動參數。第一個約束本身是作為一個線性不等式系統去約束運動方向的參數,第二個是利用所有圖像位置的旋轉幅度的統一性去進一步限制運動參數。一個兩階段從粗到細的約束框架能使AFD及AFM約束尋找出運動參數。
然而,如果沒有optical flow,normal flow是唯一的原始資料,它通常遭受到有限影像分辨率和有限視頻採樣率的問題而產生出錯誤。本文探討了這個問題的補救措施,方法是把一些攝像機併在一起,形成一個近似球形的攝像機,以增加成像系統的視野。有了一個加寬視野,normal flow的數量可更大,這可以用來抵銷normal flow在每個成像點的提取錯誤。更重要的是,攝像頭的平移和旋轉運動方向可以透過Apparent Flow分離 (AFS) 約束 及 延伸Apparent Flow分離 (EAFS) 約束來獨立估算。
除了使用單攝像機或球面成像系統之外,立體視覺成像系統提供了其它的視覺線索去尋找攝像機在沒有被任意縮放大小的平移運動和深度圖。傳統的立體視覺方法是確定在兩個輸入圖像特徵的對應。然而,對應的建立是非常困難。本文探討了兩個直接方法來恢復完整的攝像機運動,而沒有需要利用一對影像明確的點至點對應。第一種方法是利用AFD和AFM約束伸延到立體視覺系統,並提供了一個穩定的幾何方法來確定平移運動的幅度。第二個方法需要利用有一個較大的重疊視場,以提供一個不需反覆計算的closed-form算法。一旦確定了運動參數,深度圖可以沒有任何困難地重建。從normal flow產生的深度圖一般是以稀疏的形式存在。我們可以通過擴張深度圖,然後利用它作為在常見的TV-L₁框架的初始估計。其結果不僅有一個更好的重建性能,也產生出更快的運算時間。
手眼校準通常是基於像圖特徵對應。本文提出一個替代方法,是從動態攝像系統產生的normal flow來做自我校準。為了使這個方法有更強防備噪音的能力,策略是使用normal flow的流場方向去尋找手眼幾何的方向部份。偏離點及部分的手眼幾何可利用normal flow固有的流場屬性去尋找。最後完整的手眼幾何可使用穩定法來變得更可靠。手眼校準還可以被用來確定多個攝像機的相對幾何關係,而不需要求它們有重疊的視場。
Determination of general camera motion and reconstructing depth map from a captured video of the imaged scene relative to a camera is important for computer vision and various robotics tasks including visual control and autonomous navigation. A camera (or a cluster of cameras) is usually mounted on the end-effector of a robot arm when performing the above tasks. The determination of the relative geometry between the camera frame and the end-effector frame which is commonly referred as hand-eye calibration is essential to proper operation in visual control. Similarly, determining the relative geometry of multiple cameras is also important to various applications requiring the use of multi-camera rig.
The relative motion between an observer and the imaged scene generally induces apparent flow in the video. The difficulty of the problem lies mainly in that the flow pattern directly observable in the video is generally not the full flow field induced by the motion, but only partial information of it, which is orthogonal to the iso-brightness contour of the spatial image intensity profile. The partial flow field is known as the normal flow field. This thesis addresses several important problems in computer vision: determination of camera motion, recovery of depth map, and performing hand-eye calibration from the apparent flow (normal flow) pattern itself in the video data directly but not from the full flow interpolated from the apparent flow. This approach has a number of significant contributions. It does not require interpolating the flow field and in turn does not demand the imaged scene to be smooth. In contrast to optical flow, no sophisticated optimization procedures that account for handling flow discontinuities are required, and such techniques are generally computational expensive. It also breaks the classical chicken-and-egg problem between scene depth and camera motion. No prior knowledge about the locations of the discontinuities is required for motion determination. In this thesis, several direct methods are proposed to determine camera motion using three different types of imaging systems, namely monocular camera, stereo camera, and multi-camera rig.
This thesis begins with the Apparent Flow Positive Depth (AFPD) constraint to determine the motion parameters using all observable normal flows from a monocular camera. The constraint presents itself as an optimization problem to estimate the motion parameters. An iterative process in a constrained dual coarse-to-fine voting framework on the motion parameter space is used to exploit the constraint.
Due to the finite video sampling rate, the extracted normal flow field is generally more accurate in direction component than its magnitude part. This thesis proposes two constraints: one related to the direction component of the normal flow field - the Apparent Flow Direction (AFD) constraint, and the other to the magnitude component of the field - the Apparent Flow Magnitude (AFM) constraint, to determine motion. The first constraint presents itself as a system of linear inequalities to bind the direction of motion parameters; the second one uses the globality of rotational magnitude to all image positions to constrain the motion parameters further. A two-stage iterative process in a coarse-to-fine framework on the motion parameter space is used to exploit the two constraints.
Yet without the need of the interpolation step, normal flow is only raw information extracted locally that generally suffers from flow extraction error arisen from finiteness of the image resolution and video sampling rate. This thesis explores a remedy to the problem, which is to increase the visual field of the imaging system by fixating a number of cameras together to form an approximate spherical eye. With a substantially widened visual field, the normal flow data points would be in a much greater number, which can be used to combat the local flow extraction error at each image point. More importantly, the directions of translation and rotation components in general motion can be separately estimated with the use of the novel Apparent Flow Separation (AFS) and Extended Apparent Flow Separation (EAFS) constraints.
Instead of using a monocular camera or a spherical imaging system, stereo vision contributes another visual clue to determine magnitude of translation and depth map without the problem of arbitrarily scaling of the magnitude. The conventional approach in stereo vision is to determine feature correspondences across the two input images. However, the correspondence establishment is often difficult. This thesis explores two direct methods to recover the complete camera motion from the stereo system without the explicit point-to-point correspondences matching. The first method extends the AFD and AFM constraints to stereo camera, and provides a robust geometrical method to determine translation magnitude. The second method which requires the stereo image pair to have a large overlapped field of view provides a closed-form solution, requiring no iterative computation. Once the motion parameters are here, depth map can be reconstructed without any difficulty. The depth map resulted from normal flows is generally sparse in nature. We can interpolate the depth map and then utilizing it as an initial estimate in a conventional TV-L₁ framework. The result is not only a better reconstruction performance, but also a faster computation time.
Calibration of hand-eye geometry is usually based on feature correspondences. This thesis presents an alternative method that uses normal flows generated from an active camera system to perform self-calibration. In order to make the method more robust to noise, the strategy is to use the direction component of the flow field which is more noise-immune to recover the direction part of the hand-eye geometry first. Outliers are then detected using some intrinsic properties of the flow field together with the partially recovered hand-eye geometry. The final solution is refined using a robust method. The method can also be used to determine the relative geometry of multiple cameras without demanding overlap in their visual fields.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Hui, Tak Wai.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2013.
Includes bibliographical references (leaves 159-165).
Abstracts in English and Chinese.
Acknowledgements --- p.i
Abstract --- p.ii
Lists of Figures --- p.xiii
Lists of Tables --- p.xix
Chapter Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Background --- p.1
Chapter 1.2 --- Motivation --- p.4
Chapter 1.3 --- Research Objectives --- p.6
Chapter 1.4 --- Thesis Outline --- p.7
Chapter Chapter 2 --- Literature Review --- p.10
Chapter 2.1 --- Introduction --- p.10
Chapter 2.2 --- Recovery of Optical Flows --- p.10
Chapter 2.3 --- Egomotion Estimation Based on Optical Flow Field --- p.14
Chapter 2.3.1 --- Bilinear Constraint --- p.14
Chapter 2.3.2 --- Subspace Method --- p.15
Chapter 2.3.3 --- Partial Search Method --- p.16
Chapter 2.3.4 --- Fixation --- p.17
Chapter 2.3.5 --- Region Alignment --- p.17
Chapter 2.3.6 --- Linearity and Divergence Properties of Optical Flows --- p.18
Chapter 2.3.7 --- Constraint Lines and Collinear Points --- p.18
Chapter 2.3.8 --- Multi-Camera Rig --- p.19
Chapter 2.3.9 --- Discussion --- p.21
Chapter 2.4 --- Determining Egomotion Using Direct Methods --- p.22
Chapter 2.4.1 --- Introduction --- p.22
Chapter 2.4.2 --- Classical Methods --- p.23
Chapter 2.4.3 --- Pattern Matching --- p.24
Chapter 2.4.4 --- Search Subspace Method --- p.25
Chapter 2.4.5 --- Histogram-Based Method --- p.26
Chapter 2.4.6 --- Multi-Camera Rig --- p.26
Chapter 2.4.7 --- Discussion --- p.27
Chapter 2.5 --- Determining Egomotion Using Feature Correspondences --- p.28
Chapter 2.6 --- Hand-Eye Calibration --- p.30
Chapter 2.7 --- Summary --- p.31
Chapter Chapter 3 --- Determining Motion from Monocular Camera Using Merely the Positive Depth Constraint --- p.32
Chapter 3.1 --- Introduction --- p.32
Chapter 3.2 --- Related Works --- p.33
Chapter 3.3 --- Background --- p.34
Chapter 3.3 --- Apparent Flow Positive Depth (AFPD) Constraint --- p.39
Chapter 3.4 --- Numerical Solution to AFPD Constraint --- p.40
Chapter 3.5 --- Constrained Coarse-to-Fine Searching --- p.40
Chapter 3.6 --- Experimental Results --- p.43
Chapter 3.7 --- Conclusion --- p.47
Chapter Chapter 4 --- Determining Motion from Monocular Camera Using Direction and Magnitude of Normal Flows Separately --- p.48
Chapter 4.1 --- Introduction --- p.48
Chapter 4.2 --- Related Works --- p.50
Chapter 4.3 --- Apparent Flow Direction (AFD) Constraint --- p.51
Chapter 4.3.1 --- The Special Case: Pure Translation --- p.51
Chapter 4.3.1.1 --- Locus of Translation Using Full Flow as a Constraint --- p.51
Chapter 4.3.1.2 --- Locus of Translation Using Normal Flow as a Constraint --- p.53
Chapter 4.3.2 --- The Special Case: Pure Rotation --- p.54
Chapter 4.3.2.1 --- Locus of Rotation Using Full Flow as a Constraint --- p.54
Chapter 4.3.2.2 --- Locus of Rotation Using Normal Flow as a Constraint --- p.54
Chapter 4.3.3 --- Solving the System of Linear Inequalities for the Two Special Cases --- p.55
Chapter 4.3.5 --- Ambiguities of AFD Constraint --- p.59
Chapter 4.4 --- Apparent Flow Magnitude (AFM) Constraint --- p.60
Chapter 4.5 --- Putting the Two Constraints Together --- p.63
Chapter 4.6 --- Experimental Results --- p.65
Chapter 4.6.1 --- Simulation --- p.65
Chapter 4.6.2 --- Video Data --- p.67
Chapter 4.6.2.1 --- Pure Translation --- p.67
Chapter 4.6.2.2 --- General Motion --- p.68
Chapter 4.7 --- Conclusion --- p.72
Chapter Chapter 5 --- Determining Motion from Multi-Cameras with Non-Overlapping Visual Fields --- p.73
Chapter 5.1 --- Introduction --- p.73
Chapter 5.2 --- Related Works --- p.75
Chapter 5.3 --- Background --- p.76
Chapter 5.3.1 --- Image Sphere --- p.77
Chapter 5.3.2 --- Planar Case --- p.78
Chapter 5.3.3 --- Projective Transformation --- p.79
Chapter 5.4 --- Constraint from Normal Flows --- p.80
Chapter 5.5 --- Approximation of Spherical Eye by Multiple Cameras --- p.81
Chapter 5.6 --- Recovery of Motion Parameters --- p.83
Chapter 5.6.1 --- Classification of a Pair of Normal Flows --- p.84
Chapter 5.6.2 --- Classification of a Triplet of Normal Flows --- p.86
Chapter 5.6.3 --- Apparent Flow Separation (AFS) Constraint --- p.87
Chapter 5.6.3.1 --- Constraint to Direction of Translation --- p.87
Chapter 5.6.3.2 --- Constraint to Direction of Rotation --- p.88
Chapter 5.6.3.3 --- Remarks about the AFS Constraint --- p.88
Chapter 5.6.4 --- Extension of Apparent Flow Separation Constraint (EAFS) --- p.89
Chapter 5.6.4.1 --- Constraint to Direction of Translation --- p.90
Chapter 5.6.4.2 --- Constraint to Direction of Rotation --- p.92
Chapter 5.6.5 --- Solution to the AFS and EAFS Constraints --- p.94
Chapter 5.6.6 --- Apparent Flow Magnitude (AFM) Constraint --- p.96
Chapter 5.7 --- Experimental Results --- p.98
Chapter 5.7.1 --- Simulation --- p.98
Chapter 5.7.2 --- Real Video --- p.103
Chapter 5.7.2.1 --- Using Feature Correspondences --- p.108
Chapter 5.7.2.2 --- Using Optical Flows --- p.108
Chapter 5.7.2.3 --- Using Direct Methods --- p.109
Chapter 5.8 --- Conclusion --- p.111
Chapter Chapter 6 --- Motion and Shape from Binocular Camera System: An Extension of AFD and AFM Constraints --- p.112
Chapter 6.1 --- Introduction --- p.112
Chapter 6.2 --- Related Works --- p.112
Chapter 6.3 --- Recovery of Camera Motion Using Search Subspaces --- p.113
Chapter 6.4 --- Correspondence-Free Stereo Vision --- p.114
Chapter 6.4.1 --- Determination of Full Translation Using Two 3D Lines --- p.114
Chapter 6.4.2 --- Determination of Full Translation Using All Normal Flows --- p.115
Chapter 6.4.3 --- Determination of Full Translation Using a Geometrical Method --- p.117
Chapter 6.5 --- Experimental Results --- p.119
Chapter 6.5.1 --- Synthetic Image Data --- p.119
Chapter 6.5.2 --- Real Scene --- p.120
Chapter 6.6 --- Conclusion --- p.122
Chapter Chapter 7 --- Motion and Shape from Binocular Camera System: A Closed-Form Solution for Motion Determination --- p.123
Chapter 7.1 --- Introduction --- p.123
Chapter 7.2 --- Related Works --- p.124
Chapter 7.3 --- Background --- p.125
Chapter 7.4 --- Recovery of Camera Motion Using a Linear Method --- p.126
Chapter 7.4.1 --- Region-Correspondence Stereo Vision --- p.126
Chapter 7.3.2 --- Combined with Epipolar Constraints --- p.127
Chapter 7.4 --- Refinement of Scene Depth --- p.131
Chapter 7.4.1 --- Using Spatial and Temporal Constraints --- p.131
Chapter 7.4.2 --- Using Stereo Image Pairs --- p.134
Chapter 7.5 --- Experiments --- p.136
Chapter 7.5.1 --- Synthetic Data --- p.136
Chapter 7.5.2 --- Real Image Sequences --- p.137
Chapter 7.6 --- Conclusion --- p.143
Chapter Chapter 8 --- Hand-Eye Calibration Using Normal Flows --- p.144
Chapter 8.1 --- Introduction --- p.144
Chapter 8.2 --- Related Works --- p.144
Chapter 8.3 --- Problem Formulation --- p.145
Chapter 8.3 --- Model-Based Brightness Constraint --- p.146
Chapter 8.4 --- Hand-Eye Calibration --- p.147
Chapter 8.4.1 --- Determining the Rotation Matrix R --- p.148
Chapter 8.4.2 --- Determining the Direction of Position Vector T --- p.149
Chapter 8.4.3 --- Determining the Complete Position Vector T --- p.150
Chapter 8.4.4 --- Extrinsic Calibration of a Multi-Camera Rig --- p.151
Chapter 8.5 --- Experimental Results --- p.151
Chapter 8.5.1 --- Synthetic Data --- p.151
Chapter 8.5.2 --- Real Image Data --- p.152
Chapter 8.6 --- Conclusion --- p.153
Chapter Chapter 9 --- Conclusion and Future Work --- p.154
Related Publications --- p.158
Bibliography --- p.159
Appendix --- p.166
Chapter A --- Apparent Flow Direction Constraint --- p.166
Chapter B --- Ambiguity of AFD Constraint --- p.168
Chapter C --- Relationship between the Angle Subtended by any two Flow Vectors in Image Plane and the Associated Flow Vectors in Image Sphere --- p.169
Стилі APA, Harvard, Vancouver, ISO та ін.
49

Xie, Yiran. "Stereo matching using higher-order graph cuts." Master's thesis, 2012. http://hdl.handle.net/1885/150887.

Повний текст джерела
Анотація:
Stereo matching is one of the fundamental tasks in early vision. Unlike human brain recognizes objects and estimates the depth easily, it is difficult to design algorithms that perform well on a computer due to variations of illumination, occlusion or textureless. Like most of the early vision problems, stereo matching can be formulated as an energy minimization problem in which the optimal depth is the one with the lowest energy. And graph cuts is one of the efficient and effective minimization tools that avoids the problems of local minima. Conventional energy functions are defined on Markov Random Fields (MRFs) with a 4-connected grid structure derived from the image, however it is incapable of expressing complex relationship between group of pixels. This thesis focuses on exploring some aspects of stereo matching problems through higher-order structure and higher-order graph cuts. The first problem I address relates to the evaluation of five state-of-the-art segmentation approaches. Their different contributions to segment-based stereo matching have been quantitatively measured and analyzed. This works aim at helping researchers to choose the segmentation approach that most suitable for their stereo matching application. The second part of the thesis proposes a novel approach to dense stereo matching. This method features sub-segmentation and adopts a higher-order potential to enforce the label consistency inside segments as a soft constraint. Moreover, several successful techniques have been combined. Experiments show that this approach obtains state-of-the-art results while still keeping efficiency. In the last part of the thesis, a novel two-layer MRFs framework is presented in which stereo matching and surface boundary estimation are combined. Both properties are inferred simultaneously and globally so that they can benefit each other. This work has direct application in phosphene vision based human indoor navigation. Experiments prove that the proposed framework achieves significantly better performance than other popular methods in all resolutions.
Стилі APA, Harvard, Vancouver, ISO та ін.
50

Li, Hanxi. "Totally corrective boosting algorithm and application to face recognition." Phd thesis, 2011. http://hdl.handle.net/1885/151395.

Повний текст джерела
Анотація:
Boosting is one of the most well-known learning methods for building highly accurate classifiers or regressors from a set of weak classifiers. Much effort has been devoted to the understanding of boosting algorithms. However, questions remain unclear about the success of boosting. In this thesis, we study boosting algorithms from a new perspective. We started our research by empirically comparing the LPBoost and AdaBoost algorithms. The result and the corresponding analysis show that, besides the minimum margin, which is directly and globally optimized in LPBoost, the margin distribution plays a more important role. Inspired by this observation, we theoretically prove that the Lagrange dual problems of AdaBoost, LogitBoost and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizing margins and at the same time controlling the margin variance. We further point out that AdaBoost approximately maximizes the average margin, instead of the minimum margin. The duality formulation also enables us to develop column-generation based optimization algorithms, which are totally corrective. The new algorithm, which is termed AdaBoost-CG, exhibits almost identical classification results to those of standard stage-wise additive boosting algorithms, but with much faster convergence rates. Therefore, fewer weak classifiers are needed to build the ensemble using our proposed optimization technique. The significance of margin distribution motivates us to design a new column-generation based algorithm that directly maximizes the average margin while minimizes the margin variance at the same time. We term this novel method MDBoost and show its superiority over other boosting-like algorithms. Moreover, consideration of the primal and dual problems together leads to important new insights into the characteristics of boosting algorithms. We then propose a general framework that can be used to design new boosting algorithms. A wide variety of machine learning problems essentially minimize a regularized risk functional. We show that the proposed boosting framework, termed AnyBoostTc, can accommodate various loss functions and different regularizers in a totally corrective optimization way. A large body of totally corrective boosting algorithms can actually be solved very efficiently, and no sophisticated convex optimization solvers are needed, by solving the primal rather than the dual. We also demonstrate that some boosting algorithms like AdaBoost can be interpreted in our framework, even their optimization is not totally corrective, . We conclude our study by applying the totally corrective boosting algorithm to a long-standing computer vision problem-face recognition. Linear regression face recognizers, constrained by two categories of locality, are selected and combined within both the traditional and totally corrective boosting framework. To our knowledge, it is the first time that linear-representation classifiers are boosted for face recognition. The instance-based weak classifiers bring some advantages, which are theoretically or empirically proved in our work. Benefiting from the robust weak learner and the advanced learning framework, our algorithms achieve the best reported recognition rates on face recognition benchmark datasets.
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії