Dissertations / Theses: 'Face and Object Recognition'

1

Gathers, Ann D. "DEVELOPMENTAL FMRI STUDY: FACE AND OBJECT RECOGNITION." Lexington, Ky. : [University of Kentucky Libraries], 2005. http://lib.uky.edu/ETD/ukyanne2005d00276/etd.pdf.

Full text

Abstract:

Thesis (Ph. D.)--University of Kentucky, 2005.
Title from document title page (viewed on November 4, 2005). Document formatted into pages; contains xi, 152 p. : ill. Includes abstract and vita. Includes bibliographical references (p. 134-148).

APA, Harvard, Vancouver, ISO, and other styles

2

Nilsson, Linus. "Object Tracking and Face Recognition in Video Streams." Thesis, Umeå universitet, Institutionen för datavetenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-58076.

Full text

Abstract:

The goal with this project was to improve an existing face recognition system for video streams by using adaptive object tracking to track faces between frames. The knowledge of what faces occur and do not occur in subsequent frames was used to filter false faces and to better identify real ones. The recognition ability was tested by measuring how many faces were found and how many of them were correctly identified in two short video files. The tests also looked at the number of false face detections. The results were compared to a reference implementation that did not use object tracking. Two identification modes were tested: the default and strict modes. In the default mode, whichever person is most similar to a given image patch is accepted as the answer. In strict mode, the similarity also has to be above a certain threshold. The first video file had a fairly high image quality. It had only frontal faces, one at a time. The second video file had a slightly lower image quality. It had up to two faces at a time, in a larger variety of angles. The second video was therefore a more difficult case. The results show that the number of detected faces increased by 6-21% in the two video files, for both identification modes, compared to the reference implementation. In the meantime, the number of false detections remained low. In the first video file, there were fewer than 0.009 false detections per frame. In the second video file, there were fewer than 0.08 false detections per frame. The number of faces that were correctly identified increased by 8-22% in the two video files in default mode. In the first video file, there was also a large improvement in strict mode, as it went from recognising 13% to 85% of all faces. In the second video file, however,neither implementation managed to identify anyone in strict mode. The conclusion is that object tracking is a good tool for improving the accuracy of face recognition in video streams. Anyone implementing face recognition for video streams should consider using object tracking as a central component.

APA, Harvard, Vancouver, ISO, and other styles

3

Banarse, D. S. "A generic neural network architecture for deformation invariant object recognition." Thesis, Bangor University, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.362146.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Collin, Charles Alain. "Effects of spatial frequency overlap on face and object recognition." Thesis, McGill University, 2000. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=36896.

Full text

Abstract:

There has recently been much interest in how limitations in spatial frequency range affect face and object perception. This work has mainly focussed on determining which bands of frequencies are most useful for visual recognition. However, a fundamental question not yet addressed is how spatial frequency overlap (i.e., the range of spatial frequencies shared by two images) affects complex image recognition. Aside from the basic theoretical interest this question holds, it also bears on research about effects of display format (e.g., line-drawings, Mooney faces, etc.) and studies examining the nature of mnemonic representations of faces and objects. Examining the effects of spatial frequency overlap on face and object recognition is the main goal of this thesis.
A second question that is examined concerns the effect of calibration of stimuli on recognition of spatially filtered images. Past studies using non-calibrated presentation methods have inadvertently introduced aberrant frequency content to their stimuli. The effect this has on recognition performance has not been examined, leading to doubts about the comparability of older and newer studies. Examining the impact of calibration on recognition is an ancillary goal of this dissertation.
Seven experiments examining the above questions are reported here. Results suggest that spatial frequency overlap had a strong effect on face recognition and a lesser effect on object recognition. Indeed, contrary to much previous research it was found that the band of frequencies occupied by a face image had little effect on recognition, but that small variations in overlap had significant effects. This suggests that the overlap factor is important in understanding various phenomena in visual recognition. Overlap effects likely contribute to the apparent superiority of certain spatial bands for different recognition tasks, and to the inferiority of line drawings in face recognition. Results concerning the mnemonic representation of faces and objects suggest that these are both encoded in a format that retains spatial frequency information, and do not support certain proposed fundamental differences in how these two stimulus classes are stored. Data on calibration generally shows non-calibration having little impact on visual recognition, suggesting moderate confidence in results of older studies.

APA, Harvard, Vancouver, ISO, and other styles

5

Higgs, David Robert. "Parts-based object detection using multiple views /." Link to online version, 2005. https://ritdml.rit.edu/dspace/handle/1850/1000.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Mian, Ajmal Saeed. "Representations and matching techniques for 3D free-form object and face recognition." University of Western Australia. School of Computer Science and Software Engineering, 2007. http://theses.library.uwa.edu.au/adt-WU2007.0046.

Full text

Abstract:

[Truncated abstract] The aim of visual recognition is to identify objects in a scene and estimate their pose. Object recognition from 2D images is sensitive to illumination, pose, clutter and occlusions. Object recognition from range data on the other hand does not suffer from these limitations. An important paradigm of recognition is model-based whereby 3D models of objects are constructed offline and saved in a database, using a suitable representation. During online recognition, a similar representation of a scene is matched with the database for recognizing objects present in the scene . . . The tensor representation is extended to automatic and pose invariant 3D face recognition. As the face is a non-rigid object, expressions can significantly change its 3D shape. Therefore, the last part of this thesis investigates representations and matching techniques for automatic 3D face recognition which are robust to facial expressions. A number of novelties are proposed in this area along with their extensive experimental validation using the largest available 3D face database. These novelties include a region-based matching algorithm for 3D face recognition, a 2D and 3D multimodal hybrid face recognition algorithm, fully automatic 3D nose ridge detection, fully automatic normalization of 3D and 2D faces, a low cost rejection classifier based on a novel Spherical Face Representation, and finally, automatic segmentation of the expression insensitive regions of a face.

APA, Harvard, Vancouver, ISO, and other styles

7

Mian, Ajmal Saeed. "Representations and matching techniques for 3D free-form object and face recognition /." Connect to this title, 2006. http://theses.library.uwa.edu.au/adt-WU2007.0046.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Holub, Alex David Perona Pietro. "Discriminative vs. generative object recognition : objects, faces, and the web /." Diss., Pasadena, Calif. : California Institute of Technology, 2007. http://resolver.caltech.edu/CaltechETD:etd-05312007-204007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Vilaplana, Besler Verónica. "Region-based face detection, segmentation and tracking. framework definition and application to other objects." Doctoral thesis, Universitat Politècnica de Catalunya, 2010. http://hdl.handle.net/10803/33330.

Full text

Abstract:

One of the central problems in computer vision is the automatic recognition of object classes. In particular, the detection of the class of human faces is a problem that generates special interest due to the large number of applications that require face detection as a first step. In this thesis we approach the problem of face detection as a joint detection and segmentation problem, in order to precisely localize faces with pixel accurate masks. Even though this is our primary goal, in finding a solution we have tried to create a general framework as independent as possible of the type of object being searched. For that purpose, the technique relies on a hierarchical region-based image model, the Binary Partition Tree, where objects are obtained by the union of regions in an image partition. In this work, this model is optimized for the face detection and segmentation tasks. Different merging and stopping criteria are proposed and compared through a large set of experiments. In the proposed system the intra-class variability of faces is managed within a learning framework. The face class is characterized using a set of descriptors measured on the tree nodes, and a set of one-class classifiers. The system is formed by two strong classifiers. First, a cascade of binary classifiers simplifies the search space, and afterwards, an ensemble of more complex classifiers performs the final classification of the tree nodes. The system is extensively tested on different face data sets, producing accurate segmentations and proving to be quite robust to variations in scale, position, orientation, lighting conditions and background complexity. We show that the technique proposed for faces can be easily adapted to detect other object classes. Since the construction of the image model does not depend on any object class, different objects can be detected and segmented using the appropriate object model on the same image model. New object models can be easily built by selecting and training a suitable set of descriptors and classifiers. Finally, a tracking mechanism is proposed. It combines the efficiency of the mean-shift algorithm with the use of regions to track and segment faces through a video sequence, where both the face and the camera may move. The method is extended to deal with other deformable objects, using a region-based graph-cut method for the final object segmentation at each frame. Experiments show that both mean-shift based trackers produce accurate segmentations even in difficult scenarios such as those with similar object and background colors and fast camera and object movements. Lloc i
Un dels problemes més importants en l'àrea de visió artificial és el reconeixement automàtic de classes d'objectes. En particular, la detecció de la classe de cares humanes és un problema que genera especial interès degut al gran nombre d'aplicacions que requereixen com a primer pas detectar les cares a l'escena. A aquesta tesis s'analitza el problema de detecció de cares com un problema conjunt de detecció i segmentació, per tal de localitzar de manera precisa les cares a l'escena amb màscares que arribin a precisions d'un píxel. Malgrat l'objectiu principal de la tesi és aquest, en el procés de trobar una solució s'ha intentat crear un marc de treball general i tan independent com fos possible del tipus d'objecte que s'està buscant. Amb aquest propòsit, la tècnica proposada fa ús d'un model jeràrquic d'imatge basat en regions, l'arbre binari de particions (BPT: Binary Partition Tree), en el qual els objectes s'obtenen com a unió de regions que provenen d'una partició de la imatge. En aquest treball, s'ha optimitzat el model per a les tasques de detecció i segmentació de cares. Per això, es proposen diferents criteris de fusió i de parada, els quals es comparen en un conjunt ampli d'experiments. En el sistema proposat, la variabilitat dins de la classe cara s'estudia dins d'un marc de treball d'aprenentatge automàtic. La classe cara es caracteritza fent servir un conjunt de descriptors, que es mesuren en els nodes de l'arbre, així com un conjunt de classificadors d'una única classe. El sistema està format per dos classificadors forts. Primer s'utilitza una cascada de classificadors binaris que realitzen una simplificació de l'espai de cerca i, posteriorment, s'aplica un conjunt de classificadors més complexes que produeixen la classificació final dels nodes de l'arbre. El sistema es testeja de manera exhaustiva sobre diferents bases de dades de cares, sobre les quals s'obtenen segmentacions precises provant així la robustesa del sistema en front a variacions d'escala, posició, orientació, condicions d'il·luminació i complexitat del fons de l'escena. A aquesta tesi es mostra també que la tècnica proposada per cares pot ser fàcilment adaptable a la detecció i segmentació d'altres classes d'objectes. Donat que la construcció del model d'imatge no depèn de la classe d'objecte que es pretén buscar, es pot detectar i segmentar diferents classes d'objectes fent servir, sobre el mateix model d'imatge, el model d'objecte apropiat. Nous models d'objecte poden ser fàcilment construïts mitjançant la selecció i l'entrenament d'un conjunt adient de descriptors i classificadors. Finalment, es proposa un mecanisme de seguiment. Aquest mecanisme combina l'eficiència de l'algorisme mean-shift amb l'ús de regions per fer el seguiment i segmentar les cares al llarg d'una seqüència de vídeo a la qual tant la càmera com la cara es poden moure. Aquest mètode s'estén al cas de seguiment d'altres objectes deformables, utilitzant una versió basada en regions de la tècnica de graph-cut per obtenir la segmentació final de l'objecte a cada imatge. Els experiments realitzats mostren que les dues versions del sistema de seguiment basat en l'algorisme mean-shift produeixen segmentacions acurades, fins i tot en entorns complicats com ara quan l'objecte i el fons de l'escena presenten colors similars o quan es produeix un moviment ràpid, ja sigui de la càmera o de l'objecte.

APA, Harvard, Vancouver, ISO, and other styles

10

Gunn, Steve R. "Dual active contour models for image feature extraction." Thesis, University of Southampton, 1996. https://eprints.soton.ac.uk/250089/.

Full text

Abstract:

Active contours are now a very popular technique for shape extraction, achieved by minimising a suitably formulated energy functional. Conventional active contour formulations suffer difficulty in appropriate choice of an initial contour and values of parameters. Recent approaches have aimed to resolve these problems, but can compromise other performance aspects. To relieve the problem in initialisation, an evolutionary dual active contour has been developed, which is combined with a local shape model to improve the parameterisation. One contour expands from inside the target feature, the other contracts from the outside. The two contours are inter-linked to provide a balanced technique with an ability to reject weak’local energy minima. Additionally a dual active contour configuration using dynamic programming has been developed to locate a global energy minimum and complements recent approaches via simulated annealing and genetic algorithms. These differ from conventional evolutionary approaches, where energy minimisation may not converge to extract the target shape, in contrast with the guaranteed convergence of a global approach. The new techniques are demonstrated to extract successfully target shapes in synthetic and real images, with superior performance to previous approaches. The new technique employing dynamic programming is deployed to extract the inner face boundary, along with a conventional normal-driven contour to extract the outer face boundary. Application to a database of 75 subjects showed that the outer contour was extracted successfully for 96% of the subjects and the inner contour was successful for 82%. This application highlights the advantages new dual active contour approaches for automatic shape extraction can confer.

APA, Harvard, Vancouver, ISO, and other styles

11

Fasel, Ian Robert. "Learning real-time object detectors probabilistic generative approaches /." Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2006. http://wwwlib.umi.com/cr/ucsd/fullcit?p3216357.

Full text

Abstract:

Thesis (Ph. D.)--University of California, San Diego, 2006.
Title from first page of PDF file (viewed July 24, 2006). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 87-91).

APA, Harvard, Vancouver, ISO, and other styles

12

Clausen, Sally. "I never forget a face! : memory for faces and individual differences in spatial ability and gender." Honors in the Major Thesis, University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1394.

Full text

Abstract:

This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Sciences
Psychology

APA, Harvard, Vancouver, ISO, and other styles

13

Kramer, Annika. "Model based methods for locating, enhancing and recognising low resolution objects in video." Thesis, Curtin University, 2009. http://hdl.handle.net/20.500.11937/585.

Full text

Abstract:

Visual perception is our most important sense which enables us to detect and recognise objects even in low detail video scenes. While humans are able to perform such object detection and recognition tasks reliably, most computer vision algorithms struggle with wide angle surveillance videos that make automatic processing difficult due to low resolution and poor detail objects. Additional problems arise from varying pose and lighting conditions as well as non-cooperative subjects. All these constraints pose problems for automatic scene interpretation of surveillance video, including object detection, tracking and object recognition.Therefore, the aim of this thesis is to detect, enhance and recognise objects by incorporating a priori information and by using model based approaches. Motivated by the increasing demand for automatic methods for object detection, enhancement and recognition in video surveillance, different aspects of the video processing task are investigated with a focus on human faces. In particular, the challenge of fully automatic face pose and shape estimation by fitting a deformable 3D generic face model under varying pose and lighting conditions is tackled. Principal Component Analysis (PCA) is utilised to build an appearance model that is then used within a particle filter based approach to fit the 3D face mask to the image. This recovers face pose and person-specific shape information simultaneously. Experiments demonstrate the use in different resolution and under varying pose and lighting conditions. Following that, a combined tracking and super resolution approach enhances the quality of poor detail video objects. A 3D object mask is subdivided such that every mask triangle is smaller than a pixel when projected into the image and then used for model based tracking. The mask subdivision then allows for super resolution of the object by combining several video frames. This approach achieves better results than traditional super resolution methods without the use of interpolation or deblurring.Lastly, object recognition is performed in two different ways. The first recognition method is applied to characters and used for license plate recognition. A novel character model is proposed to create different appearances which are then matched with the image of unknown characters for recognition. This allows for simultaneous character segmentation and recognition and high recognition rates are achieved for low resolution characters down to only five pixels in size. While this approach is only feasible for objects with a limited number of different appearances, like characters, the second recognition method is applicable to any object, including human faces. Therefore, a generic 3D face model is automatically fitted to an image of a human face and recognition is performed on a mask level rather than image level. This approach does not require an initial pose estimation nor the selection of feature points, the face alignment is provided implicitly by the mask fitting process.

APA, Harvard, Vancouver, ISO, and other styles

14

Parkhi, Omkar Moreshwar. "Features and methods for improving large scale face recognition." Thesis, University of Oxford, 2015. https://ora.ox.ac.uk/objects/uuid:7704244a-b327-4e5c-a58e-7bfe769ed988.

Full text

Abstract:

This thesis investigates vector representations for face recognition, and uses these representations for a number of tasks in image and video datasets. First, we look at different representations for faces in images and videos. The objective is to learn compact yet effective representations for describing faces. We first investigate the use of "Fisher Vector" descriptors for this task. We show that these descriptors are perfectly suited for face representation tasks. We also investigate various approaches to effectively reduce their dimension while improving their performance further. These "Fisher Vector" features are also amenable to extreme compression and work equally well when compressed by over 2000 times as compared to their non compressed counterparts. These features achieved the state-of-the-art results on challenging public benchmarks until the re-introduction of Convolution Neural Networks (CNNs) in the community. Second, we investigate the use of "Very Deep" architectures for face representation tasks. For training these networks, we collected one of the largest annotated public datasets of celebrity faces with minimum manual intervention. We bring out specific details of these network architectures and their training objective functions essential to their performance and achieve state-of-art result on challenging datasets. Having developed these representation, we propose a method for labeling faces in the challenging environment of broadcast videos using their associated textual data, such as subtitles and transcripts. We show that our CNN representation is well suited for this task. We also propose a scheme to automatically differentiate the primary cast of a TV serial or movie from that of the background characters. We modify existing methods of collecting supervision from textual data and show that the careful alignment of video and textual data results in significant improvement in the amount of training data collected automatically, which has a direct positive impact on the performance of labeling mechanisms. We provide extensive evaluations on different benchmark datasets achieving, again, state-of-the-art results. Further we show that both the shallow as well the deep methods have excellent capabilities in switching modalities from photos to paintings and vice-a-versa. We propose a system to retrieve paintings for similar looking people given a picture and investigate the use of facial attributes for this task. Finally, we show that an on-the-fly real time search system can be built to search through thousands of hours of video data starting from a text query. We propose product quantization schemes for making face representations memory efficient. We also present the demo system based on this design for the British Broadcasting Corporation (BBC) to search through their archive. All of these contributions have been designed with a keen eye on their application in the real world. As a result, most of chapters have an associated code release and a working online demonstration.

APA, Harvard, Vancouver, ISO, and other styles

15

Reiss, Jason Edward. "Object substitution masking what is the neural fate of the unreportable target? /." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 200 p, 2007. http://proquest.umi.com/pqdweb?did=1397916081&sid=9&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Šajboch, Antonín. "Sledování a rozpoznávání lidí na videu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255441.

Full text

Abstract:

The master's thesis deals with detecting and tracking people in the video. To get optimal recognition was used convolution neural network, which extracts vector features from the enclosed frame the face. The extracted vector is further classified. Recognition process must take place in a real time and also with respect are selected optimal methods. There is a new dataset faces, which was obtained from a video record at the faculty area. Videos and dataset were used for experiments to verify the accuracy of the created system. The recognition accuracy is about 85% . The proposed system can be used, for example, to register people, counting passages or to report the occurrence of an unknown person in a building.

APA, Harvard, Vancouver, ISO, and other styles

17

Moore, Viviene M. "The effects of age of acquisition in processing people's faces and names." Thesis, Durham University, 1998. http://etheses.dur.ac.uk/4836/.

Full text

Abstract:

Word frequency and age of acquisition (AoA) influence word and object recognition and naming. High frequency and early acquired items are processed faster than low frequency and/or late acquired items. The high correlation between word frequency and AoA make these effects difficult to distinguish. However, this difficulty can be avoided by investigating the effects of AoA in the domain of recognising and naming famous faces and names. Face processing a suitable domain because the functional models of face processing were developed by analogy to word and object processing models. Nine experiments on the effects of AoA on face and name processing are reported. Experiment 1 investigated the influence of variables on naming famous faces. The variables were regressed on the speed and accuracy of face naming. Only familiarity and AoA significantly predicted successful naming. A factorial analysis and full replication revealed a consistent advantage for name production to early acquired celebrities' faces (Experiments 2 & 3). Furthermore this advantage was apparent from the first presentation (Experiment 4).Faster face and name recognition occured for early acquired than late acquired celebrities (Experiments 5 & 8). Early acquired names were read aloud faster than late acquired names (Experiment 7). Conversly semantic classifications were made faster to late acquired celebrities' faces (Experiment 6), but there was no effect in the same task to written names (Experiment 9).An effect of AoA for celebrities, whose names are acquired later in life than object names is problematic for the developmental account of AoA. Effects of AoA in recognition tasks are problematic for theorists who propose that speech output is the locus of AoA. A mechanism is proposed to account for the empirical findings. The data also presents a challenge for computer modelling to simulate the combined effects of AoA and cumulative frequency.

APA, Harvard, Vancouver, ISO, and other styles

18

Holm, Linus. "Predictive eyes precede retrieval : visual recognition as hypothesis testing." Doctoral thesis, Umeå : Department of Psychology, Umeå University, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1179.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Bichsel, Martin. "Strategies of robust object recognition for the automatic identification of human faces /." [S.l.] : [s.n.], 1991. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=9467.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Papageorgiou, Constantine P. "A Trainable System for Object Detection in Images and Video Sequences." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/5566.

Full text

Abstract:

This thesis presents a general, trainable system for object detection in static images and video sequences. The core system finds a certain class of objects in static images of completely unconstrained, cluttered scenes without using motion, tracking, or handcrafted models and without making any assumptions on the scene structure or the number of objects in the scene. The system uses a set of training data of positive and negative example images as input, transforms the pixel images to a Haar wavelet representation, and uses a support vector machine classifier to learn the difference between in-class and out-of-class patterns. To detect objects in out-of-sample images, we do a brute force search over all the subwindows in the image. This system is applied to face, people, and car detection with excellent results. For our extensions to video sequences, we augment the core static detection system in several ways -- 1) extending the representation to five frames, 2) implementing an approximation to a Kalman filter, and 3) modeling detections in an image as a density and propagating this density through time according to measured features. In addition, we present a real-time version of the system that is currently running in a DaimlerChrysler experimental vehicle. As part of this thesis, we also present a system that, instead of detecting full patterns, uses a component-based approach. We find it to be more robust to occlusions, rotations in depth, and severe lighting conditions for people detection than the full body version. We also experiment with various other representations including pixels and principal components and show results that quantify how the number of features, color, and gray-level affect performance.

APA, Harvard, Vancouver, ISO, and other styles

21

Rajnoha, Martin. "Určování podobnosti objektů na základě obrazové informace." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2021. http://www.nusl.cz/ntk/nusl-437979.

Full text

Abstract:

Monitoring of public areas and their automatic real-time processing became increasingly significant due to the changing security situation in the world. However, the problem is an analysis of low-quality records, where even the state-of-the-art methods fail in some cases. This work investigates an important area of image similarity – biometric identification based on face image. The work deals primarily with the face super-resolution from a sequence of low-resolution images and it compares this approach to the single-frame methods, that are still considered as the most accurate. A new dataset was created for this purpose, which is directly designed for the multi-frame face super-resolution methods from the low-resolution input sequence, and it is of comparable size with the leading world datasets. The results were evaluated by both a survey of human perception and defined objective metrics. A hypothesis that multi-frame methods achieve better results than single-frame methods was proved by a comparison of both methods. Architectures, source code and the dataset were released. That caused a creation of the basis for future research in this field.

APA, Harvard, Vancouver, ISO, and other styles

22

Wang, Zeng. "Laser-based detection and tracking of dynamic objects." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:c7f2da08-fa1e-4121-b06b-31aad16ecddd.

Full text

Abstract:

In this thesis, we present three main contributions to laser-based detection and tracking of dynamic objects, from both a model-based point of view and a model-free point of view, with an emphasis on applications to autonomous driving. A segmentation-based detector is first proposed to provide an end-to-end detection of the classes car, pedestrian and bicyclist in 3D laser data amongst significant background clutter. We postulate that, for the particular classes considered, solving a binary classification task outperforms approaches that tackle the multi-class problem directly. This is confirmed using custom and third-party datasets gathered of urban street scenes. The sliding window approach to object detection, while ubiquitous in the Computer Vision community, is largely neglected in laser-based object detectors, possibly due to its perceived computational inefficiency. We give a second thought to this opinion in this thesis, and demonstrate that, by fully exploiting the sparsity of the problem, exhaustive window searching in 3D can be made efficient. We prove the mathematical equivalence between sparse convolution and voting, and devise an efficient algorithm to compute exactly the detection scores at all window locations, processing a complete Velodyne scan containing 100K points in less than half a second. Its superior performance is demonstrated on the KITTI dataset, and compares commensurably with state of the art vision approaches. A new model-free approach to detection and tracking of moving objects with a 2D lidar is then proposed aiming at detecting dynamic objects of arbitrary shapes and classes. Objects are modelled by a set of rigidly attached sample points along their boundaries whose positions are initialised with and updated by raw laser measurements, allowing a flexible, nonparametric representation. Dealing with raw laser points poses a significant challenge to data association. We propose a hierarchical approach, and present a new variant of the well-known Joint Compatibility Branch and Bound algorithm to handle large numbers of measurements. The system is systematically calibrated on real world data containing 7.5K labelled object examples and validated on 6K test cases. Its performance is demonstrated over an existing industry standard targeted at the same problem domain as well as a classical approach to model-free tracking.

APA, Harvard, Vancouver, ISO, and other styles

23

Morris, Ryan L. "Hand/Face/Object." Kent State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=kent155655052646378.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Hanafi, Marsyita. "Face recognition from face signatures." Thesis, Imperial College London, 2012. http://hdl.handle.net/10044/1/10566.

Full text

Abstract:

This thesis presents techniques for detecting and recognizing faces under various imaging conditions. In particular, it presents a system that combines several methods for face detection and recognition. Initially, the faces in the images are located using the Viola-Jones method and each detected face is represented by a subimage. Then, an eye and mouth detection method is used to identify the coordinates of the eyes and mouth, which are then used to update the subimages so that the subimages contain only the face area. After that, a method based on Bayesian estimation and a fuzzy membership function is used to identify the actual faces on both subimages (obtained from the first and second steps). Then, a face similarity measure is used to locate the oval shape of a face in both subimages. The similarity measures between the two faces are compared and the one with the highest value is selected. In the recognition task, the Trace transform method is used to extract the face signatures from the oval shape face. These signatures are evaluated using the BANCA and FERET databases in authentication tasks. Here, the signatures with discriminating ability are selected and were used to construct a classifier. However, the classifier was shown to be a weak classifier. This problem is tackled by constructing a boosted assembly of classifiers developed by a Gentle Adaboost algorithm. The proposed methodologies are evaluated using a family album database.

APA, Harvard, Vancouver, ISO, and other styles

25

Helmer, Scott. "Embodied object recognition." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/42481.

Full text

Abstract:

The ability to localize and categorize objects via imagery is central to many potential applications, including autonomous vehicles, mobile robotics, and surveillance. In this thesis we employ a probabilistic approach to show how utilizing multiple images of the same scene can improve detection. We cast the task of object detection as finding the set of objects that maximize the posterior probability given a model of the categories and a prior for their spatial arrangements. We first present an approach to detection that leverages depth data from binocular stereo by factoring classification into two terms: an independent appearance-based object classifier, and a term for the 3D shape. We overcome the missing data and the limited fidelity of stereo by focusing on the size of the object and the presence of discontinuities. We go on to demonstrate that even with off-the-shelf stereo algorithms we can significantly improve detection on two household objects, mugs and shoes, in the presence of significant background clutter and textural variation. We also present a novel method for object detection, both in 2D and in 3D, from multiple images with known extrinsic camera parameters. We show that by also inferring the 3D position of the objects we can improve object detection by incorporating size priors and reasoning about the 3D geometry of a scene. We also show that integrating information across multiple viewpoints allows us to boost weak classification responses, overcome occlusion, and reduce false positives. We demonstrate the efficacy of our approach, over single viewpoint detection, on a dataset containing mugs, bottles, bowls, and shoes in a variety of challenging scenarios.

APA, Harvard, Vancouver, ISO, and other styles

26

Wells, William Mercer. "Statistical object recognition." Thesis, Massachusetts Institute of Technology, 1993. http://hdl.handle.net/1721.1/12606.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1993.
Includes bibliographical references (p. 169-177).
by William Mercer Wells, III.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

27

Figueroa, Flores Carola. "Visual Saliency for Object Recognition, and Object Recognition for Visual Saliency." Doctoral thesis, Universitat Autònoma de Barcelona, 2021. http://hdl.handle.net/10803/671964.

Full text

Abstract:

Per als humans, el reconeixement d’objectes és un procés gairebé instantani, precís i extremadament adaptable. A més, tenim la capacitat innata d’aprendre classes d’objectes nous a partir d’uns pocs exemples. El cervell humà redueix la complexitat de les dades entrants filtrant part de la informació i processant només aquelles coses que ens capturen l’atenció. Això, barrejat amb la nostra predisposició biològica per respondre a determinades formes o colors, ens permet reconèixer en un simple cop d’ull les regions més importants o destacades d’una imatge. Aquest mecanisme es pot observar analitzant sobre quines parts de les imatges hi posa l’atenció; on es fixen els ulls quan se’ls mostra una imatge. La forma més precisa de registrar aquest comportament és fer un seguiment dels moviments oculars mentre es mostren imatges. L’estimació computacional de la salubritat té com a objectiu identificar fins a quin punt les regions o els objectes destaquen respecte als seus entorns per als observadors humans. Els mapes Saliency es poden utilitzar en una àmplia gamma d’aplicacions, inclosa la detecció d’objectes, la compressió d’imatges i vídeos i el seguiment visual. La majoria de les investigacions en aquest camp s’han centrat en estimar automàticament els mapes de salubritat donats una imatge d’entrada. En el seu lloc, en aquesta tesi, ens proposem incorporar mapes de salubritat en una canalització de reconeixement d’objectes: volem investigar si els mapes de salubritat poden millorar els resultats del reconeixement d’objectes.En aquesta tesi, identifiquem diversos problemes relacionats amb l’estimació de la salubritat visual. En primer lloc, fins a quin punt es pot aprofitar l’estimació de la salubritat per millorar la formació d’un model de reconeixement d’objectes quan es disposa de dades d’entrenament escasses. Per solucionar aquest problema, dissenyem una xarxa de classificació d’imatges que incorpori informació d’informació salarial com a entrada. Aquesta xarxa processa el mapa de saliència a través d’una branca de xarxa dedicada i utilitza les característiques resultants per modular les característiques visuals estàndard de baix a dalt de l’entrada d’imatge original. Ens referirem a aquesta tècnica com a classificació d’imatges modulades en salinitat (SMIC). En amplis experiments sobre conjunts de dades de referència estàndard per al reconeixement d’objectes de gra fi, demostrem que la nostra arquitectura proposada pot millorar significativament el rendiment, especialment en el conjunt de dades amb dades de formació escasses.A continuació, abordem l’inconvenient principal de la canonada anterior: SMIC requereix un algorisme de saliència explícit que s’ha de formar en un conjunt de dades de saliència. Per solucionar-ho, implementem un mecanisme d’al·lucinació que ens permet incorporar la branca d’estimació de la salubritat en una arquitectura de xarxa neuronal entrenada de punta a punta que només necessita la imatge RGB com a entrada. Un efecte secundari d’aquesta arquitectura és l’estimació de mapes de salubritat. En experiments, demostrem que aquesta arquitectura pot obtenir resultats similars en reconeixement d’objectes com SMIC, però sense el requisit de mapes de salubritat de la veritat del terreny per entrenar el sistema. Finalment, hem avaluat la precisió dels mapes de salubritat que es produeixen com a efecte secundari del reconeixement d’objectes. Amb aquest propòsit, fem servir un conjunt de conjunts de dades de referència per a l’avaluació de la validesa basats en experiments de seguiment dels ulls. Sorprenentment, els mapes de salubritat estimats són molt similars als mapes que es calculen a partir d’experiments de rastreig d’ulls humans. Els nostres resultats mostren que aquests mapes de salubritat poden obtenir resultats competitius en els mapes de salubritat de referència. En un conjunt de dades de saliència sintètica, aquest mètode fins i tot obté l’estat de l’art sense la necessitat d’haver vist mai una imatge de saliència real.
El reconocimiento de objetos para los seres humanos es un proceso instantáneo, preciso y extremadamente adaptable. Además, tenemos la capacidad innata de aprender nuevas categorias de objetos a partir de unos pocos ejemplos. El cerebro humano reduce la complejidad de los datos entrantes filtrando parte de la información y procesando las cosas que captan nuestra atención. Esto, combinado con nuestra predisposición biológica a responder a determinadas formas o colores, nos permite reconocer en una simple mirada las regiones más importantes o destacadas de una imagen. Este mecanismo se puede observar analizando en qué partes de las imágenes los sujetos ponen su atención; por ejemplo donde fijan sus ojos cuando se les muestra una imagen. La forma más precisa de registrar este comportamiento es rastrear los movimientos de los ojos mientras se muestran imágenes. La estimación computacional del ‘saliency’, tiene como objetivo diseñar algoritmos que, dada una imagen de entrada, estimen mapas de ‘saliency’. Estos mapas se pueden utilizar en una variada gama de aplicaciones, incluida la detección de objetos, la compresión de imágenes y videos y el seguimiento visual. La mayoría de la investigación en este campo se ha centrado en estimar automáticamente estos mapas de ‘saliency’, dada una imagen de entrada. En cambio, en esta tesis, nos propusimos incorporar la estimación de ‘saliency’ en un procedimiento de reconocimiento de objeto, puesto que, queremos investigar si los mapas de ‘saliency’ pueden mejorar los resultados de la tarea de reconocimiento de objetos. En esta tesis, identificamos varios problemas relacionados con la estimación del ‘saliency’ visual. Primero, pudimos determinar en qué medida se puede aprovechar la estimación del ‘saliency’ para mejorar el entrenamiento de un modelo de reconocimiento de objetos cuando se cuenta con escasos datos de entrenamiento. Para resolver este problema, diseñamos una red de clasificación de imágenes que incorpora información de ‘saliency’ como entrada. Esta red procesa el mapa de ‘saliency’ a través de una rama de red dedicada y utiliza las características resultantes para modular las características visuales estándar ascendentes de la entrada de la imagen original. Nos referiremos a esta técnica como clasificación de imágenes moduladas por prominencia (SMIC en inglés). En numerosos experimentos realizando sobre en conjuntos de datos de referencia estándar para el reconocimiento de objetos ‘fine-grained’, mostramos que nuestra arquitectura propuesta puede mejorar significativamente el rendimiento, especialmente en conjuntos de datos con datos con escasos datos de entrenamiento. Luego, abordamos el principal inconveniente del problema anterior: es decir, SMIC requiere explícitamente un algoritmo de ‘saliency’, el cual debe entrenarse en un conjunto de datos de ‘saliency’. Para resolver esto, implementamos un mecanismo de alucinación que nos permite incorporar la rama de estimación de ‘saliency’ en una arquitectura de red neuronal entrenada de extremo a extremo que solo necesita la imagen RGB como entrada. Un efecto secundario de esta arquitectura es la estimación de mapas de ‘saliency’. En varios experimentos, demostramos que esta arquitectura puede obtener resultados similares en el reconocimiento de objetos como SMIC pero sin el requisito de mapas de ‘saliency’ para entrenar el sistema. Finalmente, evaluamos la precisión de los mapas de ‘saliency’ que ocurren como efecto secundario del reconocimiento de objetos. Para ello, utilizamos un de conjuntos de datos de referencia para la evaluación de la prominencia basada en experimentos de seguimiento ocular. Sorprendentemente, los mapas de ‘saliency’ estimados son muy similares a los mapas que se calculan a partir de experimentos de seguimiento ocular humano. Nuestros resultados muestran que estos mapas de ‘saliency’ pueden obtener resultados competitivos en mapas de ‘saliency’ de referencia.
For humans, the recognition of objects is an almost instantaneous, precise and extremely adaptable process. Furthermore, we have the innate capability to learn new object classes from only few examples. The human brain lowers the complexity of the incoming data by filtering out part of the information and only processing those things that capture our attention. This, mixed with our biological predisposition to respond to certain shapes or colors, allows us to recognize in a simple glance the most important or salient regions from an image. This mechanism can be observed by analyzing on which parts of images subjects place attention; where they fix their eyes when an image is shown to them. The most accurate way to record this behavior is to track eye movements while displaying images. Computational saliency estimation aims to identify to what extent regions or objects stand out with respect to their surroundings to human observers. Saliency maps can be used in a wide range of applications including object detection, image and video compression, and visual tracking. The majority of research in the field has focused on automatically estimating saliency maps given an input image. Instead, in this thesis, we set out to incorporate saliency maps in an object recognition pipeline: we want to investigate whether saliency maps can improve object recognition results. In this thesis, we identify several problems related to visual saliency estimation. First, to what extent the estimation of saliency can be exploited to improve the training of an object recognition model when scarce training data is available. To solve this problem, we design an image classification network that incorporates saliency information as input. This network processes the saliency map through a dedicated network branch and uses the resulting characteristics to modulate the standard bottom-up visual characteristics of the original image input. We will refer to this technique as saliency-modulated image classification (SMIC). In extensive experiments on standard benchmark datasets for fine-grained object recognition, we show that our proposed architecture can significantly improve performance, especially on dataset with scarce training data. Next, we address the main drawback of the above pipeline: SMIC requires an explicit saliency algorithm that must be trained on a saliency dataset. To solve this, we implement a hallucination mechanism that allows us to incorporate the saliency estimation branch in an end-to-end trained neural network architecture that only needs the RGB image as an input. A side-effect of this architecture is the estimation of saliency maps. In experiments, we show that this architecture can obtain similar results on object recognition as SMIC but without the requirement of ground truth saliency maps to train the system. Finally, we evaluated the accuracy of the saliency maps that occur as a side-effect of object recognition. For this purpose, we use a set of benchmark datasets for saliency evaluation based on eye-tracking experiments. Surprisingly, the estimated saliency maps are very similar to the maps that are computed from human eye-tracking experiments. Our results show that these saliency maps can obtain competitive results on benchmark saliency maps. On one synthetic saliency dataset this method even obtains the state-of-the-art without the need of ever having seen an actual saliency image for training.
Universitat Autònoma de Barcelona. Programa de Doctorat en Informàtica

APA, Harvard, Vancouver, ISO, and other styles

28

Zhou, Shaohua. "Unconstrained face recognition." College Park, Md. : University of Maryland, 2004. http://hdl.handle.net/1903/1800.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2004.
Thesis research directed by: Electrical Engineering. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

29

Ustun, Bulend. "3d Face Recognition." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/2/12609075/index.pdf.

Full text

Abstract:

In this thesis, the effect of registration process is evaluated as well as several methods proposed for 3D face recognition. Input faces are in point cloud form and have noises due to the nature of scanner technologies. These inputs are noise filtered and smoothed before registration step. In order to register the faces an average face model is obtained from all the images in the database. All the faces are registered to the average model and stored to the database. Registration is performed by using a rigid registration technique called ICP (Iterative Closest Point), probably the most popular technique for registering two 3D shapes. Furthermore some variants of ICP are implemented and they are evaluated in terms of accuracy, time and number of iterations needed for convergence. At the recognition step, several recognition methods, namely Eigenface, Fisherface, NMF (Nonnegative Matrix Factorization) and ICA (Independent Component Analysis) are tested on registered and non-registered faces and the performances are evaluated.

APA, Harvard, Vancouver, ISO, and other styles

30

Wong, Vincent. "Human face recognition /." Online version of thesis, 1994. http://hdl.handle.net/1850/11882.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Lee, Colin K. "Infrared face recognition." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2004. http://library.nps.navy.mil/uhtbin/hyperion/04Jun%5FLee%5FColin.pdf.

Full text

Abstract:

Thesis (M.S. in Electrical Engineering)--Naval Postgraduate School, June 2004.
Thesis advisor(s): Monique P. Fargues, Gamani Karunasiri. Includes bibliographical references (p. 135-136). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

32

Furesjö, Fredrik. "Multiple cue object recognition." Licentiate thesis, KTH, Numerical Analysis and Computer Science, NADA, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277.

Full text

Abstract:

Nature is rich in examples of how vision can be successfully used for sensing and perceiving the world and how the gathered information can be utilized to perform a variety of different objectives. The key to successful vision is the internal representations of the visual agent, which enable the agent to successfully perceive properties about the world. Humans perceive a multitude of properties of the world through our visual sense, such as motion, shape, texture, and color. In addition we also perceive the world to be structured into objects which are clustered into different classes - categories. For such a rich perception of the world many different internal representations that can be combined in different ways are necessary. So far much work in computer vision has been focused on finding new and, out of some perspective, better descriptors and not much work has been done on how to combine different representations.

In this thesis a purposive approach in the context of a visual agent to object recognition is taken. When considering object recognition from this view point the situatedness in form of the context and task of the agent becomes central. Further a multiple feature representation of objects is proposed, since a single feature might not be pertinent to the task at hand nor be robust in a given context.

The first contribution of this thesis is an evaluation of single feature object representations that have previously been used in computer vision for object recognition. In the evaluation different interest operators combined with different photometric descriptors are tested together with a shape representation and a statistical representation of the whole appearance. Further a color representation, inspired from human color perception, is presented and used in combination with the shape descriptor to increase the robustness of object recognition in cluttered scenes.

In the last part, which contains the second contribution, of this thesis a vision system for object recognition based on multiple feature object representation is presented together with an architecture of the agent that utilizes the proposed representation. By taking a system perspective to object recognition we will consider the representations performance under a given context and task. The scenario considered here is derived from a fetch scenario performed by a service robot.

APA, Harvard, Vancouver, ISO, and other styles

33

Karlsen, Mats-Gøran. "Android object recognition framework." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-19219.

Full text

Abstract:

This thesis is a continuation of the author’s specialization project where the ultimate goal is to build an object recognition framework suitable for mobile devices in real world environments, where control over parameters such as illumination, distance, noise and availability of consistent network architectures are limited. Based on shortcomings related to object recognition performance and architectural issues the author’s goal was to increase the flexibility, usability and performance of the framework.Literature was reviewed on frameworks in order to discover useful techniques for development and documentation. Together with a re-introduction to the implemented recognition scheme an evaluation of the original framework artefact was performed with regards to the goals of this thesis. The results from the evaluation aided in finding an approach that balanced trade-offs between flexibility, usability, correctness and performance. By using proven framework development and documentation tactics from the literature study the author created a new iteration of the framework, improving upon the previous solution. The result is a stand alone artefact containing a hierarchy of software packages which divide functionality and offer customization using a combination of inheritance and components. The introduction of components hides domain knowledge and allows for easier reuse.In order to improve recognition performance and framework flexibility the author added external server support for image information extraction as well as support for the usage of different feature detectors and descriptor extractors. Because of time constraints the author did not test these new feature detectors and descriptor extractors suitability or performance. This testing can now be performed by the customer.In order to ensure proper correctness a lower bound on the image resolution is set at 600x600 pixels. Using properly built models correct recognition in about 90% of the cases is achievable. The added support for server side information extraction improves the object recognition performance by 42% in ideal conditions using the lower bound images. This improvement is still not enough to meet the performance criteria and combined with other issues results in the framework falling short of being ready to build production environment applications.

APA, Harvard, Vancouver, ISO, and other styles

34

Furesjö, Fredrik. "Multiple cue object recognition /." Stockholm : KTH Numerical Analysis and Computer Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Fergus, Robert. "Visual object category recognition." Thesis, University of Oxford, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.425029.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Lavoie, Matt J. "Three dimensional object recognition." Honors in the Major Thesis, University of Central Florida, 1991. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/3.

Full text

Abstract:

This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Arts and Sciences
Computer Sciences

APA, Harvard, Vancouver, ISO, and other styles

37

Meng, Meng. "Human Object Interaction Recognition." Thesis, Lille 1, 2017. http://www.theses.fr/2017LIL10008/document.

Full text

Abstract:

Dans cette thèse, nous avons étudié la reconnaissance des actions qui incluent l'intéraction avec l’objet à partir des données du skeleton et des informations de profondeur fournies par les capteurs RGB-D. Il existe deux principales applications que nous abordons dans cette thèse: la reconnaissance de l'interaction humaine avec l'objet et la reconnaissance d'une activité anormale. Nous proposons, dan un premier temps, une modélisation spatio-temporelle pour la reconnaissance en ligne et hors ligne des interactions entre l’humain et l’objet. Ces caractéristiques ont été adoptées pour la reconnaissance en ligne des interactions humaines avec l’objet et pour la détection de la démarche anormale. Ensuite, nous proposons des caractéristiques liées à d'objet qui décrivent approximativement la forme et la taille de l’objet. Ces caractéristiques sont fusionnées avec les caractéristiques bas-niveau pour la reconnaissance en ligne des interactions humaines avec l’objet. Les expériences menées sur deux benchmarks démontrent l’efficacité de la méthode proposée. Dans le deuxième volet de ce travail, nous étendons l'étude à la détection de la démarche anormale en utilisant le cadre en ligne l’approche. Afin de valider la robustesse de l’approche à la pose, nous avons collecté une base multi-vue pour des interactions humaines avec l’objet, de façon normale et anormale. Les résultats expérimentaux sur le benchmark des actions anormales frontales et sur la nouvelles base prouvent l’efficacité de l’approche
In this thesis, we have investigated the human object interaction recognition by using the skeleton data and local depth information provided by RGB-D sensors. There are two main applications we address in this thesis: human object interaction recognition and abnormal activity recognition. First, we propose a spatio-temporal modeling of human-object interaction videos for on-line and off-line recognition. In the spatial modeling of human object interactions, we propose low-level feature and object related distance feature which adopted on on-line human object interaction recognition and abnormal gait detection. Then, we propose object feature, a rough description of the object shape and size as new features to model human-object interactions. This object feature is fused with the low-level feature for online human object interaction recognition. In the temporal modeling of human object interactions, we proposed a shape analysis framework based on low-level feature and object related distance feature for full sequence-based off-line recognition. Experiments carried out on two representative benchmarks demonstrate the proposed method are effective and discriminative for human object interaction analysis. Second, we extend the study to abnormal gait detection by using the on-line framework of human object interaction classification. The experiments conducted following state-of-the-art settings on the benchmark shows the effectiveness of proposed method. Finally, we collected a multi-view human object interaction dataset involving abnormal and normal human behaviors by RGB-D sensors. We test our model on the new dataset and evaluate the potential of the proposed approach

APA, Harvard, Vancouver, ISO, and other styles

38

Baker, Jonathan D. (Jonathan Daniel). "Multiresolution statistical object recognition." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/37721.

Full text

Abstract:

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.
Includes bibliographical references (leaves 105-108).
by Jonathan D. Baker.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

39

Cox, David Daniel. "Reverse engineering object recognition." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/42042.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2007.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Page 95 blank.
Includes bibliographical references (p. 83-94).
Any given object in the world can cast an effectively infinite number of different images onto the retina, depending on its position relative to the viewer, the configuration of light sources, and the presence of other objects in the visual field. In spite of this, primates can robustly recognize a multitude of objects in a fraction of a second, with no apparent effort. The computational mechanisms underlying these amazing abilities are poorly understood. This thesis presents a collection of work from human psychophysics, monkey electrophysiology, and computational modelling in an effort to reverse-engineer the key computational components that enable this amazing ability in the primate visual system.
by David Daniel Cox.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

40

Wallenberg, Marcus. "Embodied Visual Object Recognition." Doctoral thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-132762.

Full text

Abstract:

Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. This is especially challenging due to the high dimensionality of image data. In cases where end-to-end learning from pixels to output is needed, mechanisms designed to make inputs tractable are often necessary for less computationally capable embodied systems.Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. Therefore, the way in which attention mechanisms should be introduced into feature extraction and estimation algorithms must be carefully considered when constructing a recognition system.This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, problem-specific feature selection, efficient estimator training and attentional modulation in convolutional neural networks. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. In order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. Feature selection and efficient discriminant sampling for decision tree-based estimators have also been implemented. Finally, attentional multi-layer modulation of convolutional neural networks for recognition in cluttered scenes has been evaluated. Several of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.
Embodied Visual Object Recognition
FaceTrack

APA, Harvard, Vancouver, ISO, and other styles

41

Matas, J. "Colour-based object recognition." Thesis, University of Surrey, 1995. http://epubs.surrey.ac.uk/843934/.

Full text

Abstract:

This thesis studies the use of colour information for object recognition. A new representation for objects with multiple colours - the colour adjacency graph (CAG) - is proposed. Each node of the CAG represents a single chromatic component of the image defined as a set of pixels forming a unimodal cluster in the chromatic scattergram. Edges encode information about adjacency of colour components and their reflectance ratio. The CAG is related to both the histogram and region adjacency graph representations. It is shown to be preserving and combining the best features of these two approaches while avoiding their drawbacks. The proposed approach is tested on a range of difficult object recognition and localisation problems involving complex imagery of non rigid 3D objects under varied viewing conditions with excellent results.

APA, Harvard, Vancouver, ISO, and other styles

42

Johnson, Taylor Christine. "Object Recognition and Classification." Thesis, The University of Arizona, 2012. http://hdl.handle.net/10150/243970.

Full text

Abstract:

Object recognition and classification is a common problem facing computers. There are many shortcomings in proper identification of an object when it comes to computer algorithms. A very common process used to deal with classification problems is neural networks. Neural networks are modelled after the human brain and the neuron _rings that occur when an individual looks at an image and identifies the objects in it. In this work we propose a probabilistic neural network that takes into account the regional properties of an image of either an ant or an egg as determined by edge segmentation and an extraction of geometric features specific to the object. To do this the algorithm calculates the regional properties of a black and white representation of the object and then gives these properties to the probabilistic neural network which calculates the probability of the object being an ant or an egg.

APA, Harvard, Vancouver, ISO, and other styles

43

Lee, Yeongseon. "Bayesian 3D multiple people tracking using multiple indoor cameras and microphones." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29668.

Full text

Abstract:

Thesis (Ph.D)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Rusell M. Mersereau; Committee Member: Biing Hwang (Fred) Juang; Committee Member: Christopher E. Heil; Committee Member: Georgia Vachtsevanos; Committee Member: James H. McClellan. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

44

Whitney, Hannah L. "Object agnosia and face processing." Thesis, University of Southampton, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.548326.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Qu, Yawe, and Mingxi Yang. "Online Face Recognition Game." Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-248.

Full text

Abstract:

The purpose of this project is to test and improve people’s ability of face recognition.

Although there are some tests on the internet with the same purpose, the problem is that people

may feel bored and give up before finishing the tests. Consequently they may not benefit from

testing nor from training. To solve this problem, face recognition and online game are put

together in this project. The game is supposed to provide entertainment when people are playing,

so that more people can take the test and improve their abilities of face recognition.

In the game design, the game is assumed to take place in the face recognition lab, which is

an imaginary lab. The player plays the main role in this game and asked to solve a number of

problems. There are several scenarios waiting for the player, which mainly need face recognition

skills from the player. At the end the player obtains the result of evaluation of her/his skills in

face recognition.

APA, Harvard, Vancouver, ISO, and other styles

46

Batur, Aziz Umit. "Illumination-robust face recognition." Diss., Georgia Institute of Technology, 2003. http://hdl.handle.net/1853/15440.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Graham, Daniel B. "Pose-varying face recognition." Thesis, University of Manchester, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.488288.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Zhou, Mian. "Gobor-boosting face recognition." Thesis, University of Reading, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.494814.

Full text

Abstract:

In the past decade, automatic face recognition has received much attention by both the commercial and public sectors as an efficient and resilient recognition technique in biometrics. This thesis describes a highly accurate appearance-based algorithm for grey scale front-view face recognition - Gabor-Boosting face recognition by means of computer vision, pattern recognition, image processing, machine learning etc. The strong performance of the Gabor-boosting face recognition algorithm is highlighted by combining three key leading edge techniques - the Gabor wavelet transform, AdaBoost, Support Vector Machine (SVM). The Gabor wavelet transform is used to extract features which describe texture variations of human faces. The Adaboost algorithm is used to select most significant features which represent different individuals. The SVM constructs a classifier with high recognition accuracy. Within the AdaBoost algorithm, a novel weak learner - Potsu is designed. The Potsu weak learner is fast due to the simple perception prototype, and is accurate due to large number of training examples available. More importantly, the Potsu weak learner is the only weak learner which satisfies the requirement of AdaBoost. The Potsu weak learners also demonstrate superior performance over other weak learners, such as FLD. The Gabor-Boosting face recognition algorithm is extended into multi-class classification domain, in which a multi-class weak learner called mPotsu is developed. The experiments show that performance is improved by applying loosely controlled face recognition in the multi-class classification.

APA, Harvard, Vancouver, ISO, and other styles

49

Abi, Antoun Ramzi. "Pose-Tolerant Face Recognition." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/244.

Full text

Abstract:

Automatic face recognition performance has been steadily improving over years of active research, however it remains significantly affected by a number of external factors such as illumination, pose, expression, occlusion and resolution that can severely alter the appearance of a face and negatively impact recognition scores. The focus of this thesis is the pose problem which remains largely overlooked in most real-world applications. Specifically, we focus on one-to-one matching scenarios where a query face image of a random pose is matched against a set of “mugshot-style” near-frontal gallery images. We argue that in this scenario, a 3D face-modeling geometric approach is essential in tackling the pose problem. For this purpose, we utilize a recent technique for efficient synthesis of 3D face models called 3D General Elastic Model (3DGEM). It solved the pose synthesis problem from a single frontal image, but could not solve the pose correction problem because of missing face data due to self-occlusion. In this thesis, we extend the formulation of 3DGEM and cast this task as an occlusion-removal problem. We propose a sparse feature extraction approach using subspace-modeling and `1-minimization to find a representation of the geometrically 3D-corrected faces that we show is stable under varying pose and resolution. We then show how pose-tolerance can be achieved either in the feature space or in the reconstructed image space. We present two different algorithms that capitalize on the robustness of the sparse feature extracted from the pose-corrected faces to achieve high matching rates that are minimally impacted by the variation in pose. We also demonstrate high verification rates upon matching nonfrontal to non-frontal faces. Furthermore, we show that our pose-correction framework lends itself very conveniently to the task of super-resolution. By building a multiresolution subspace, we apply the same sparse feature extraction technique to achieve single-image superresolution with high magnification rates. We discuss how our layered framework can potentially solve both pose and resolution problems in a unified and systematic approach. The modularity of our framework also keeps it flexible, upgradable and expandable to handle other external factors such as illumination or expressions. We run extensive tests on the MPIE dataset to validate our findings.

APA, Harvard, Vancouver, ISO, and other styles

50

Lincoln, Michael C. "Pose-independent face recognition." Thesis, University of Essex, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.250063.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Face and Object Recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles