Tesis sobre el tema "Computer vision, object detection, action recognition"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Computer vision, object detection, action recognition".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Anwer, Rao Muhammad. "Color for Object Detection and Action Recognition". Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/120224.
Texto completoRecognizing object categories in real world images is a challenging problem in computer vision. The deformable part based framework is currently the most successful approach for object detection. Generally, HOG are used for image representation within the part-based framework. For action recognition, the bag-of-word framework has shown to provide promising results. Within the bag-of-words framework, local image patches are described by SIFT descriptor. Contrary to object detection and action recognition, combining color and shape has shown to provide the best performance for object and scene recognition. In the first part of this thesis, we analyze the problem of person detection in still images. Standard person detection approaches rely on intensity based features for image representation while ignoring the color. Channel based descriptors is one of the most commonly used approaches in object recognition. This inspires us to evaluate incorporating color information using the channel based fusion approach for the task of person detection. In the second part of the thesis, we investigate the problem of object detection in still images. Due to high dimensionality, channel based fusion increases the computational cost. Moreover, channel based fusion has been found to obtain inferior results for object category where one of the visual varies significantly. On the other hand, late fusion is known to provide improved results for a wide range of object categories. A consequence of late fusion strategy is the need of a pure color descriptor. Therefore, we propose to use Color attributes as an explicit color representation for object detection. Color attributes are compact and computationally efficient. Consequently color attributes are combined with traditional shape features providing excellent results for object detection task. Finally, we focus on the problem of action detection and classification in still images. We investigate the potential of color for action classification and detection in still images. We also evaluate different fusion approaches for combining color and shape information for action recognition. Additionally, an analysis is performed to validate the contribution of color for action recognition. Our results clearly demonstrate that combining color and shape information significantly improve the performance of both action classification and detection in still images.
Friberg, Oscar. "Recognizing Semantics in Human Actions with Object Detection". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-212579.
Texto completoFaltningsnätverk i två strömmar är just nu den mest lyckade tillvägagångsmetoden för mänsklig aktivitetsigenkänning, vilket delar upp rumslig och timlig information i en rumslig ström och en timlig ström. Den rumsliga strömmen tar emot individella RGB bildrutor för igenkänning, medan den timliga strömmen tar emot en sekvens av optisk flöde. Försök i att utöka ramverket för faltningsnätverk i två strömmar har gjorts i tidigare arbete. Till exempel har försök gjorts i att komplementera dessa två nätverk med ett tredje nätverk som tar emot extra information. I detta examensarbete söker vi metoder för att utöka faltningsnätverk i två strömmar genom att introducera en semantisk ström med objektdetektion. Vi gör i huvudsak två bidrag i detta examensarbete: Först visar vi att den semantiska strömmen tillsammans med den rumsliga strömmen och den timliga strömmen kan bidra till små förbättringar för mänsklig aktivitetsigenkänning i video på riktmärkesstandarder. För det andra söker vi efter divergensutökningstekniker som tvingar den semantiska strömme att komplementera de andra två strömmarna genom att modifiera förlustfunktionen under träning. Vi ser små förbättringar med att använda dessa tekniker för att öka divergens.
Kalogeiton, Vasiliki. "Localizing spatially and temporally objects and actions in videos". Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/28984.
Texto completoRanalli, Lorenzo. "Studio ed implementazione di un modello di Action Recognition. Classificazione delle azioni di gioco e della tipologia di colpi durante un match di Tennis". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Buscar texto completoLiu, Chang. "Human motion detection and action recognition". HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1108.
Texto completoTa, Anh Phuong. "Inexact graph matching techniques : application to object detection and human action recognition". Lyon, INSA, 2010. http://theses.insa-lyon.fr/publication/2010ISAL0099/these.pdf.
Texto completoLa détection d’objets et la reconnaissance des activités humaines sont les deux domaines actifs dans la vision par ordinateur, qui trouve des applications en robotique, vidéo surveillance, analyse des images médicales, interaction homme-machine, annotation et recherche de la vidéo par le contenue. Actuellement, il reste encore très difficile de construire de tels systèmes, en raison des variations des classes d’objets et d’actions, les différents points de vue, ainsi que des changements d’illumination, des mouvements de caméra, des fonds dynamiques et des occlusions. Dans cette thèse, nous traitons le problème de la détection d’objet et d’activités dans la vidéo. Malgré ses différences de buts, les problèmes fondamentaux associés partagent de nombreuses propriétés, par exemple la nécessité de manipuler des transformations non-ridiges. En décrivant un modèle d’objet ou une vidéo par un ensemble des caractéristiques locales, nous formulons le problème de reconnaissance comme celui d’une mise en correspondance de graphes, dont les nœuds représentent les caractéristiques locales, et les arrêtes représentent les relations que l’on veut vérifier entre ces caractéristiques. Le problème de mise en correspondance inexacte de graphes est connu comme NP-difficile, nous avons donc porté notre effort sur des solutions approchées. Pour cela, le problème est transformé en problème d’optimisation d’une fonction d’énergie, qui contient un terme en rapport avec la distance entre les descripteurs locaux et d’autres termes en rapport avec les relations spatiales (ou/et temporelles) entre eux. Basé sur cette énergie, deux différentes solutions ont été proposées et validées pour les deux applications ciblées: la reconnaissance d’objets à partir d’images et la reconnaissance des activités dans la vidéo. En plus, nous avons également proposé un nouveaux descripteur pour améliorer les modèles de Sac-de-mots, qui sont largement utilisé dans la vision par ordinateur. Nos expérimentations sur deux bases standards, ainsi que sur nos bases démontrent que les méthodes proposées donnent de bons résultats en comparant avec l’état de l’art dans ces deux domaines
Dittmar, George William. "Object Detection and Recognition in Natural Settings". PDXScholar, 2013. https://pdxscholar.library.pdx.edu/open_access_etds/926.
Texto completoHiggs, David Robert. "Parts-based object detection using multiple views /". Link to online version, 2005. https://ritdml.rit.edu/dspace/handle/1850/1000.
Texto completoPan, Xiang. "Approaches for edge detection, pose determination and object representation in computer vision". Thesis, Heriot-Watt University, 1994. http://hdl.handle.net/10399/1378.
Texto completoTonge, Ashwini Kishor. "Object Recognition Using Scale-Invariant Chordiogram". Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984116/.
Texto completoCase, Isaac. "Automatic object detection and tracking in video /". Online version of thesis, 2010. http://hdl.handle.net/1850/12332.
Texto completoClark, Daniel S. "Object detection and tracking using a parts-based approach /". Link to online version, 2005. https://ritdml.rit.edu/dspace/handle/1850/1167.
Texto completoNaha, Shujon. "Zero-shot Learning for Visual Recognition Problems". IEEE, 2015. http://hdl.handle.net/1993/31806.
Texto completoOctober 2016
Liu, X. (Xin). "Human motion detection and gesture recognition using computer vision methods". Doctoral thesis, Oulun yliopisto, 2019. http://urn.fi/urn:isbn:9789526222011.
Texto completoTiivistelmä Eleet ovat läsnä useimmissa päivittäisissä ihmisen toiminnoissa. Automaattista eleiden analyysia tarvitaan laitteiden ja ihmisten välisestä vuorovaikutuksesta parantamiseksi ja tavoitteena on yhtä luonnollinen vuorovaikutus kuin ihmisten välinen vuorovaikutus. Konenäön näkökulmasta eleiden analyysijärjestelmä koostuu ihmisen liikkeiden havainnoinnista ja eleiden tunnistamisesta. Tämä väitöskirjatyö edistää eleanalyysin-tutkimusta erityisesti kahdesta näkökulmasta: 1) Havainnointi - ihmisen liikkeiden segmentointi videosekvenssistä. 2) Ymmärtäminen - elemarkkerien erottaminen ja tunnistaminen. Väitöskirjan ensimmäinen osa esittelee kaksi liikkeen havainnointi menetelmää, jotka perustuvat harvan signaalin rekonstruktioon. Videokuvan etualan (ihmisen liikkeet) pikselit eivät yleensä ole satunnaisesti jakautuneita vaan niillä toisistaan riippuvia ominaisuuksia spatiaali- ja aikatasolla tarkasteltuna. Tähän havaintoon perustuen esitellään spatiaalis-ajallinen harva rekonstruktiomalli, joka käsittää etualan pikseleiden klusteroinnin spatiaalisen koherenssin ja ajallisen jatkuvuuden perusteella. Lisäksi tehdään oletus, että pikseli on monikanavainen signaali (RGB-väriarvot). Pikselin ollessa samankaltainen vieruspikseliensä kanssa myös niiden värikanava-arvot ovat samankaltaisia. Havaintoon nojautuen kehitettiin kanavat yhdistävä lasso-regularisointi, joka mahdollistaa monikanavaisen signaalin tasaisuuden tutkimisen. Väitöskirjan toisessa osassa esitellään kaksi menetelmää ihmisen eleiden tunnistamiseksi. Menetelmiä voidaan käyttää eleiden ajallisen dynamiikan ongelmien (eleiden nopeuden vaihtelu) ratkaisemiseksi, mikä on ensiarvoisen tärkeää havainnoitujen eleiden oikein tulkitsemiseksi. Ensimmäisessä menetelmässä ele kuvataan luurankomallin liikeratana Riemannin monistossa (Riemannian manifold), joka hyödyntää aikavääristymille sietoista metriikkaa. Lisäksi esitellään harvakoodaus (sparse coding) luurankomallien liikeradoille. Harvakoodaus perustuu nimiöintitietoon, jonka tavoitteena on varmistua koodisanaston keskinäisestä riippumattomuudesta. Toisen menetelmän lähtökohtana on havainto, että ele on ajallinen sarja selkeästi määriteltäviä vaiheita. Vaiheiden yhdistämiseen ehdotetaan matala-asteista matriisihajotelmamallia, jotta piilotilat voidaan sovittaa paremmin Markovin piilomalliin (Hidden Markov Model)
Prokaj, Jan. "DETECTING CURVED OBJECTS AGAINST CLUTTERED BACKGROUNDS". Master's thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2847.
Texto completoM.S.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science MS
IACONO, MASSIMILIANO. "Object detection and recognition with event driven cameras". Doctoral thesis, Università degli studi di Genova, 2020. http://hdl.handle.net/11567/1005981.
Texto completoIrhebhude, Martins. "Object detection, recognition and re-identification in video footage". Thesis, Loughborough University, 2015. https://dspace.lboro.ac.uk/2134/19600.
Texto completoGarcia, Rui Pedro Figueiredo. "Object recognition for a service robot". Master's thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/17393.
Texto completoA contínua evolução da tecnologia e o crescimento no desenvolvimento de aplicações robóticas tornou possível a criação de robôs autónomos que consigam assistir ou até mesmo substituir os humanos em tarefas diárias e trabalhos monótomos. Atualmente, com o envelhecimento da população humana, é esperado que os robôs de serviço venham a ser cada vez mais utilizados para assistência de pessoas idosas ou com deficiência. Para isso, um robô de serviços tem que ser capaz de evitar obstáculos enquanto se movimenta em ambientes conhecidos ou desconhecidos, ser capaz de detetar e manipular objetos e perceber comandos dados pelos humanos. O objetivo desta dissertação é o desenvolvimento de um sistema de visão, capaz de detetar e identificar objetos, para o robô CAMBADA@Home. O sistema de visão proposto implementa dois métodos para deteção de objetos, sendo o primeiro baseado em histogramas de cor e o segundo método usando algoritmos de deteção e descrição de pontos de interesse (algoritmos SIFT e SURF). O sistema usa informação de profundidade e de cor, sendo a informação 3D usada para detectar objetos que estejam pousados sobre superfícies planas. Os resultados experimentais obtidos com o robô CAMBADA@Home são apresentados e discutidos, com o objetivo de avaliar a robustez do sistema proposto.
The continuous evolution of technology and the fast development of robotic applications has made possible to create autonomous robots that can assist or even replace humans in daily routines and monotonous jobs. Nowadays, with the aging of the world population, it is expected that service robots can be explored to assist elderly or disable people. For this, a service robot has to be capable of avoiding obstacles while navigating in known and unknown environments, recognizing and manipulating objects and understanding commands from humans. The objective of this dissertation is the development of a vision system, capable to detect and recognize household objects, for the service robot CAMBADA@Home. The proposed approach implements two methods for object detection, the first one based on color histograms and the second method using feature detection algorithms (SIFT and SURF algorithms). It uses depth and color information, where the 3D data is used to detect the objects that are found on horizontal planes. Experimental results obtained with the CAMBADA@Home robot are presented and discussed, in order to evaluate the robustness of the proposed system.
Olafsson, Björgvin. "Partially Observable Markov Decision Processes for Faster Object Recognition". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632.
Texto completoSolmon, Joanna Browne. "Using GIST Features to Constrain Search in Object Detection". PDXScholar, 2014. https://pdxscholar.library.pdx.edu/open_access_etds/1957.
Texto completoTaurone, Francesco. "3D Object Recognition from a Single Image via Patch Detection by a Deep CNN". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/18669/.
Texto completoLi, Ying. "Efficient and Robust Video Understanding for Human-robot Interaction and Detection". The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu152207324664654.
Texto completoSolini, Arianna. "Applicazione di Deep Learning e Computer Vision ad un Caso d'uso aziendale: Progettazione, Risoluzione ed Analisi". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Buscar texto completoThaung, Ludwig. "Advanced Data Augmentation : With Generative Adversarial Networks and Computer-Aided Design". Thesis, Linköpings universitet, Datorseende, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-170886.
Texto completoNICORA, ELENA. "Efficient Projections for Salient Motion Detection and Representation". Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1091835.
Texto completoPiemontese, Cristiano. "Progettazione e implementazione di una applicazione didattica interattiva per il riconoscimento di oggetti basata sull'algoritmo SIFT". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10883/.
Texto completoAnguzza, Umberto. "A method to develop a computer-vision based system for the automaticac dairy cow identification and behaviour detection in free stall barns". Doctoral thesis, Università di Catania, 2013. http://hdl.handle.net/10761/1334.
Texto completoSharma, Vinay. "Simultaneous object detection and segmentation using top-down and bottom-up processing". Columbus, Ohio : Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1196372113.
Texto completoMaurice, Camille. "Reconnaissance d'actions humaines dans des vidéos, en particulier lors d'interaction avec des objets". Thesis, Toulouse 3, 2020. http://www.theses.fr/2020TOU30188.
Texto completoIn this thesis we study the recognition of actions of daily life. Typically, different actions take place in the same place and involve various objects. This problem is difficult because of the variety and resemblance of some actions and the clutter in the background. Many computer vision approaches study this problem and their performance is often dependent on the setting of certain hyper-parameters. For example, for deep learning approaches there are: the initialization of the learning-rate, the size of the mini-batch... Based on this observation, we begin with a comparative study of hyper-parameter optimization tools from the literature applied to a computer vision problem. Then we propose a first Bayesian approach for online action recognition based on high-level 3D primitives: the observation of the human skeleton and surrounding objects. The parameters to be set are optimized thanks to the optimization tool that emerges from our comparative study. The performances of this first approach are compared to a deep state of the art learning network, and a certain complementarity emerges that we propose to exploit through a fusion mechanism. Finally, following recent advances in graph convolutional networks, we propose a light and modular approach based on the construction of spatio-temporal graphs of the skeleton and objects. The validity of the different approaches is evaluated, in raw performance and with respect to under-represented actions on different public data sets that propose sequences of actions of everyday life. Our approaches show interesting results compared to the literature especially regarding imbalanced data and under-represented classes in datasets
Lin, Chung-Ching. "Detecting and tracking moving objects from a moving platform". Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/49014.
Texto completoPetit, Antoine. "Robust visual detection and tracking of complex objects : applications to space autonomous rendez-vous and proximity operations". Phd thesis, Université Rennes 1, 2013. http://tel.archives-ouvertes.fr/tel-00931604.
Texto completoAzizpour, Hossein. "Visual Representations and Models: From Latent SVM to Deep Learning". Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192289.
Texto completoQC 20160908
Harzallah, Hedi. "Contribution à la détection et à la reconnaissance d'objets dans les images". Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00628027/en/.
Texto completoAbou, Bakr Nachwa. "Reconnaissance et modélisation des actions de manipulation". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM010.
Texto completoThis thesis addresses the problem of recognition, modelling and description of human activities. We describe results on three problems: (1) the use of transfer learning for simultaneous visual recognition of objects and object states, (2) the recognition of manipulation actions from state transitions, and (3) the interpretation of a series of actions and states as events in a predefined story to construct a narrative description.These results have been developed using food preparation activities as an experimental domain. We start by recognising food classes such as tomatoes and lettuce and food states, such as sliced and diced, during meal preparation. We adapt the VGG network architecture to jointly learn the representations of food items and food states using transfer learning. We model actions as the transformation of object states. We use recognised object properties (state and type) to detect corresponding manipulation actions by tracking object transformations in the video. Experimental performance evaluation for this approach is provided using the 50 salads and EPIC-Kitchen datasets. We use the resulting action descriptions to construct narrative descriptions for complex activities observed in videos of 50 salads dataset
Li, Yunming. "Machine vision algorithms for mining equipment automation". Thesis, Queensland University of Technology, 2000.
Buscar texto completoAzaza, Aymen. "Context, motion and semantic information for computational saliency". Doctoral thesis, Universitat Autònoma de Barcelona, 2018. http://hdl.handle.net/10803/664359.
Texto completoThe main objective of this thesis is to highlight the salient object in an image or in a video sequence. We address three important --- but in our opinion insufficiently investigated --- aspects of saliency detection. Firstly, we start by extending previous research on saliency which explicitly models the information provided from the context. Then, we show the importance of explicit context modelling for saliency estimation. Several important works in saliency are based on the usage of object proposals. However, these methods focus on the saliency of the object proposal itself and ignore the context. To introduce context in such saliency approaches, we couple every object proposal with its direct context. This allows us to evaluate the importance of the immediate surround (context) for its saliency. We propose several saliency features which are computed from the context proposals including features based on omni-directional and horizontal context continuity. Secondly, we investigate the usage of top-down methods (high-level semantic information) for the task of saliency prediction since most computational methods are bottom-up or only include few semantic classes. We propose to consider a wider group of object classes. These objects represent important semantic information which we will exploit in our saliency prediction approach. Thirdly, we develop a method to detect video saliency by computing saliency from supervoxels and optical flow. In addition, we apply the context features developed in this thesis for video saliency detection. The method combines shape and motion features with our proposed context features. To summarize, we prove that extending object proposals with their direct context improves the task of saliency detection in both image and video data. Also the importance of the semantic information in saliency estimation is evaluated. Finally, we propose a new motion feature to detect saliency in video data. The three proposed novelties are evaluated on standard saliency benchmark datasets and are shown to improve with respect to state-of-the-art.
Peloušek, Jan. "Sledování obličejových rysů v reálném čase". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-218823.
Texto completoLamberti, Lorenzo. "A deep learning solution for industrial OCR applications". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19777/.
Texto completoLee, Yeongseon. "Bayesian 3D multiple people tracking using multiple indoor cameras and microphones". Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29668.
Texto completoCommittee Chair: Rusell M. Mersereau; Committee Member: Biing Hwang (Fred) Juang; Committee Member: Christopher E. Heil; Committee Member: Georgia Vachtsevanos; Committee Member: James H. McClellan. Part of the SMARTech Electronic Thesis and Dissertation Collection.
Becattini, Federico. "Object and action annotation in visual media beyond categories". Doctoral thesis, 2018. http://hdl.handle.net/2158/1121033.
Texto completoMoria, Kawther. "Computer vision-based detection of fire and violent actions performed by individuals in videos acquired with handheld devices". Thesis, 2016. http://hdl.handle.net/1828/7423.
Texto completoGraduate
Kim, Jaechul. "Region detection and matching for object recognition". 2013. http://hdl.handle.net/2152/21261.
Texto completotext
Mohammed, Hussein Adnan. "Object detection and recognition in complex scenes". Master's thesis, 2014. http://hdl.handle.net/10400.1/8368.
Texto completoContour-based object detection and recognition in complex scenes is one of the most dificult problems in computer vision. Object contours in complex scenes can be fragmented, occluded and deformed. Instances of the same class can have a wide range of variations. Clutter and background edges can provide more than 90% of all image edges. Nevertheless, our biological vision system is able to perform this task effortlessly. On the other hand, the performance of state-of-the-art computer vision algorithms is still limited in terms of both speed and accuracy. The work in this thesis presents a simple, efficient and biologically motivated method for contour-based object detection and recognition in complex scenes. Edge segments are extracted from training and testing images using a simple contour-following algorithm at each pixel. Then a descriptor is calculated for each segment using Shape Context, including an offset distance relative to the centre of the object. A Bayesian criterion is used to determine the discriminative power of each segment in a query image by means of a nearest-neighbour lookup, and the most discriminative segments vote for potential bounding boxes. The generated hypotheses are validated using the k nearest-neighbour method in order to eliminate false object detections. Furthermore, meaningful model segments are extracted by finding edge fragments that appear frequently in training images of the same class. Only 2% of the training segments are employed in the models. These models are used as a second approach to validate the hypotheses, using a distancebased measure based on nearest-neighbour lookups of each segment of the hypotheses. A review of shape coding in the visual cortex of primates is provided. The shape-related roles of each region in the ventral pathway of the visual cortex are described. A further step towards a fully biological model for contourbased object detection and recognition is performed by implementing a model for meaningful segment extraction and binding on the basis of two biological principles: proximity and alignment. Evaluation on a challenging benchmark is performed for both k nearestneighbour and model-segment validation methods. Recall rates of the proposed method are compared to the results of recent state-of-the-art algorithms at 0.3 and 0.4 false positive detections per image.
Erasmus Mundus action 2, Lot IIY 2011 Scholarship Program.
"Intelligent surveillance system employing object detection, recognition, segmentation, and object-based coding". 2013. http://library.cuhk.edu.hk/record=b5879094.
Texto completo對于全天候地準確地管理成千上萬地攝像機,人工智能化的視頻監控是非常必要而且重要的。通常來說,智能監控包括以下部分: 1 信息獲取,如利用一個或者多個攝像機或者熱感成像或深度成像攝像機; 2 視頻分析,如目標檢測,識別,跟蹤,再識別或分割。3 存儲和傳輸,如編碼,分類和製片。在本文中,我們構建一個智能監控系統,其包括三個相互協作的摄像機用來估計感興趣物體的3D位置並且進行研究和跟蹤。為了識別物體,我們提出級聯頭肩檢測器尋找人臉區域進行識別。感興趣物體分割出來用于任意形狀物體編碼器對物體進行壓縮。
在第一部分中,我們討論如何使多個攝像頭在一起工作。在我們系統中,兩個固定的攝像機像人眼一樣註視著整個監控場景,搜尋非正常事件。如果有警報被非正常事件激活, PTZ攝像機會用來處理該事件,例如去跟蹤或者調查不明物體。利用相機標定技術,我們可以估計出物體的3D信息并將其傳輪到三個攝像機。
在第二部分中,我們提出級聯頭肩檢測器來檢測正面的頭肩并進行高級別的物體分析,例如識別和異常行為分析。在檢測器中,我們提出利用級聯結構融閤兩種強大的特徵, Harar-like 特微和HOG特徽,他們能有傚的檢測人臉和行人。利用Harr-like特徵,頭肩檢測器能夠在初期用有限的計算去除非頭肩區域。檢測的區域可以用來識別和分割。
在第三部分中,利用訓練的糢型,人臉區域可以從檢測到的頭肩區域中提取。利用CAMshift對人臉區域進行細化。在視頻監控的環境中,人臉識別是十分具有挑戰性的,因為人臉圖像受到多種因素的影響,例如在不均勻光綫條件下變化姿態和非聚焦糢糊的人臉。基于上述觀測,我們提出一種使用OLPF特微結閤AGMM糢型的人臉識別方法,其中OLPF特徵不僅不受糢糊圖像的影響,而且對人臉的姿態很魯棒。AGMM能夠很好地構建多種人臉。對標準測試集和實際數據的實驗結果證明了我們提出的方法一直地优于其它最先進的人臉識別方法。
在第四部分中,我們提出一種自動人體分割系統。首先,我們用檢測到的人臉或者人體對graph cut分割模型初始化并使用max-flow /min-cut算法對graph進行優化。針對有缺點的檢測目標的情況,采用一種基于coarse-to-fine的分割策略。我們提出抹除背景差別技術和自適應初始化level set 技術來解決存在于通用模型中的讓人頭疼的分割問題,例如發生在高差別的物體邊界區域或者在物體和背景中存在相同顏色的錯誤分割。實驗結果證明了我們的人體分割系統在實時視頻圖像和具有復雜背景的標準測試序列中都能很好的運作。
在最后部分中,我們專註于怎么樣對視頻內容進行智能的壓縮。在最近幾十年里,視頻編碼研究取得了巨大的成就,例如H.264/AVC標準和下一代的HEVC標準,它們的壓縮性能大大的超過以往的標準,高于50% 。但是相對于MPEG-4 ,在最新的編碼標準中缺少了壓縮任意形狀物體的能力。雖然在現在的H.264/AVC 中提供了片組結構和彈性模塊組閤技術,但是它仍然不能準確地高效地處理任意形狀區域。為了解決H.264/AVC 的這一缺點,我們提出基于H.264/AVC編碼框架的任意形狀物體編碼,它包括二值圖像編碼,運動補償和紋理編碼。在我們系統里,我們采用了1) 用新的運動估計改進的二值圖像編碼,它對二值塊的預測很有用。2) 在紋理編碼中,采用新的任意形狀整型變換來壓縮紋理信息,它是一種從4x4的ICT衍生出來的變換。3)和一些讓該編碼器勻新的框架兼容的相關編碼技術。我們把編碼器應用到高清視頻序列並且從客觀方便和主觀方面對編碼器進行評估。實驗結果證明了我們的編碼器遠遠超越以前的物體編碼方法並且十分接近H.264/AVC 的編碼性能。
Surveillance is the process of monitoring the behaviour, activities, or changing information, usually of people for the purpose of managing, directing or protecting by means of electronic equipment, such as closed-circuit television (CCTV) camera or interception of electronically transmitted information from a distance, such as Internet or phone calls. Some potential surveillance applications are homeland security, anti-crime, traffic control, monitoring children, elderly and patients at a distance. Surveillance technology provides a shield against terrorism and abnormal event, and cheap modern electronics makes it possible to implement with CCTV cameras. But unless the feeds from those cameras are constantly monitored, they only provide an illusion of security. Finding enough observers to watch thousands of screens simply is impractical, yet modern automated systems can solve the problems with a surprising degree of intelligence.
Surveillance with intelligence is necessary and important to accurately mange the information from millions of sensors in 7/24 hours. Generally, intelligent surveillance includes: 1. information acquirement, like a single or the collaboration of multiple cameras, thermal or depth camera; 2. video analysis, like object detection, recognition, tracking, re-identification and segmentation; 3. storage and transmission, like coding, classification, and footage. In this thesis, we build an intelligent surveillance system, in which three cameras working collaboratively to estimate the position of the object of interest (OOI) in 3D space, investigate and track it. In order to identify the OOI, Cascade Head-Shoulder Detector is proposed to find the face region for recognition. The object can be segmented out and compressed by arbitrarily shaped object coding (ASOC).
In the first part, we discuss how to make the multiple cameras work together. In our system, two stationary cameras, like human eyes, are focusing on the whole scene of the surveillance region to observe abnormal events. If an alarm is triggered by abnormal instance, a PTZ camera will be assigned to deal with it, such as tracking orinvestigating the object. With calibrated cameras, the 3D information of the object can be estimated and communicated among the three cameras.
In the second part, cascade head-shoulder detector (CHSD) is proposed to detect the frontal head-shoulder region in the surveillance videos. The high-level object analysis will be performed on the detected region, e.g., recognition and abnormal behaviour analysis. In the detector, we propose a cascading structure that fuses the two powerful features: Haar-like feature and HOG feature, which have been used to detect face and pedestrian efficiently. With the Haar-like feature, CHSD can reject most of non-headshoulder regions in the earlier stages with limited computations. The detected region can be used for recognition and segmentation.
In the third part, the face region can be extracted from the detected head-shoulder region with training the body model. Continuously adaptive mean shift (CAMshift) is proposed to refine the face region. Face recognition is a very challenging problem in surveillance environment because the face image suffers from the concurrence of multiple factors, such as a variant pose with out-of-focused blurring under non-uniform lighting condition. Based on this observations, we propose a face recognition method using overlapping local phase feature (OLPF) feature and adaptive Gaussian mixture model (AGMM). OLPF feature is not only invariant to blurring but also robust to pose variations and AGMM can robustly model the various faces. Experiments conducted on standard dataset and real data demonstrate that the proposed method consistently outperforms the state-of-art face recognition methods.
In the forth part, we propose an automatic human body segmentation system. We first initialize graph cut using the detected face/body and optimize the graph by maxflow/ min-cut. And then a coarse-to-fine segmentation strategy is employed to deal with the imperfectly detected object. Background contrast removal (BCR) and selfadaptive initialization level set (SAILS) are proposed to solve the tough problems that exist in the general graph cut model, such as errors occurred at object boundary with high contrast and similar colors in the object and background. Experimental results demonstrate that our body segmentation system works very well in live videos and standard sequences with complex background.
In the last part, we concentrate on how to intelligently compress the video context. In recent decades, video coding research has achieved great progress, such as inH.264/AVC and next generation HEVC whose compression performance significantly exceeds previous standards by more than 50%. But as compared with the MPEG-4, the capability of coding arbitrarily shaped objects is absent from the following standards. Despite of the provision of slice group structures and flexible macroblock ordering (FMO) in the current H.264/AVC, it cannot deal with arbitrarily shaped regions accurately and efficiently. To solve the limitation of H.264/AVC, we propose the arbitrarily shaped object coding (ASOC) based on the framework H.264/AVC, which includes binary alpha coding, motion compensation and texture coding. In our ASOC, we adopt (1) an improved binary alpha Coding with a novel motion estimation to facilitate the binary alpha blocks prediction, (2) an arbitrarily shaped integer transform derivative from the 4×4 ICT in H.264/AVC to code texture and (3) associated coding techniques to make ASOC more compatible with the new framework. We extent ASOC to HD video and evaluate it objectively and subjectively. Experimental results prove that our ASOC significantly outperforms previous object-coding methods and performs close to the H.264/AVC.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Liu, Qiang.
"November 2012."
Thesis (Ph.D.)--Chinese University of Hong Kong, 2013.
Includes bibliographical references (leaves 123-135).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
Abstracts in English and Chinese.
Dedication --- p.ii
Acknowledgments --- p.iii
Abstract --- p.vii
Publications --- p.x
Nomenclature --- p.xii
Contents --- p.xviii
List of Figures --- p.xxii
List of Tables --- p.xxiii
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Motivation and objectives --- p.1
Chapter 1.2 --- A brief review of camera calibration --- p.2
Chapter 1.3 --- Object detection --- p.5
Chapter 1.3.1 --- Face detection --- p.5
Chapter 1.3.2 --- Pedestrian detection --- p.7
Chapter 1.4 --- Recognition --- p.8
Chapter 1.5 --- Segmentation --- p.10
Chapter 1.5.1 --- Thresholding-based methods --- p.11
Chapter 1.5.2 --- Clustering-based methods --- p.11
Chapter 1.5.3 --- Histogram-based methods --- p.12
Chapter 1.5.4 --- Region-growing methods --- p.12
Chapter 1.5.5 --- Level set methods --- p.13
Chapter 1.5.6 --- Graph cut methods --- p.13
Chapter 1.5.7 --- Neural network-based methods --- p.14
Chapter 1.6 --- Object-based video coding --- p.14
Chapter 1.7 --- Organization of thesis --- p.16
Chapter 2 --- Cameras Calibration --- p.18
Chapter 2.1 --- Introduction --- p.18
Chapter 2.2 --- Basic Equations --- p.21
Chapter 2.2.1 --- Parameters of Camera Model --- p.22
Chapter 2.2.2 --- Two-view homography induced by a Plane --- p.22
Chapter 2.3 --- Pair-wise pose estimation --- p.23
Chapter 2.3.1 --- Homography estimation --- p.24
Chapter 2.3.2 --- Calculation of n and λ --- p.24
Chapter 2.3.3 --- (R,t) Estimation --- p.25
Chapter 2.4 --- Distortion analysis and correction --- p.27
Chapter 2.5 --- Feature detection and matching --- p.28
Chapter 2.6 --- 3D point estimation and evaluation --- p.30
Chapter 2.7 --- Conclusion --- p.34
Chapter 3 --- Cascade Head-Shoulder Detector --- p.35
Chapter 3.1 --- Introduction --- p.35
Chapter 3.2 --- Cascade head-shoulder detection --- p.36
Chapter 3.2.1 --- Initial feature rejecter --- p.37
Chapter 3.2.2 --- Haar-like rejecter --- p.39
Chapter 3.2.3 --- HOG feature classifier --- p.40
Chapter 3.2.4 --- Cascade of classifiers --- p.45
Chapter 3.3 --- Experimental results and analysis --- p.46
Chapter 3.3.1 --- CHSD training --- p.46
Chapter 3.4 --- Conclusion --- p.49
Chapter 4 --- A Robust Face Recognition in Surveillance --- p.50
Chapter 4.1 --- Introduction --- p.50
Chapter 4.2 --- Cascade head-shoulder detection --- p.53
Chapter 4.2.1 --- Body model training --- p.53
Chapter 4.2.2 --- Face region refinement --- p.54
Chapter 4.3 --- Face recognition --- p.56
Chapter 4.3.1 --- Overlapping local phase feature (OLPF) --- p.56
Chapter 4.3.2 --- Fixed Gaussian Mixture Model (FGMM) --- p.59
Chapter 4.3.3 --- Adaptive Gaussian mixture model --- p.61
Chapter 4.4 --- Experimental verification --- p.62
Chapter 4.4.1 --- Preprocessing --- p.62
Chapter 4.4.2 --- Face recognition --- p.63
Chapter 4.5 --- Conclusion --- p.66
Chapter 5 --- Human Body Segmentation --- p.68
Chapter 5.1 --- Introduction --- p.68
Chapter 5.2 --- Proposed automatic human body segmentation system --- p.70
Chapter 5.2.1 --- Automatic human body detection --- p.71
Chapter 5.2.2 --- Object Segmentation --- p.73
Chapter 5.2.3 --- Self-adaptive initialization level set --- p.79
Chapter 5.2.4 --- Object Updating --- p.86
Chapter 5.3 --- Experimental results --- p.87
Chapter 5.3.1 --- Evaluation using real-time videos and standard sequences --- p.87
Chapter 5.3.2 --- Comparison with Other Methods --- p.87
Chapter 5.3.3 --- Computational complexity analysis --- p.91
Chapter 5.3.4 --- Extensions --- p.93
Chapter 5.4 --- Conclusion --- p.93
Chapter 6 --- Arbitrarily Shaped Object Coding --- p.94
Chapter 6.1 --- Introduction --- p.94
Chapter 6.2 --- Arbitrarily shaped object coding --- p.97
Chapter 6.2.1 --- Shape coding --- p.97
Chapter 6.2.2 --- Lossy alpha coding --- p.99
Chapter 6.2.3 --- Motion compensation --- p.102
Chapter 6.2.4 --- Texture coding --- p.105
Chapter 6.3 --- Performance evaluation --- p.108
Chapter 6.3.1 --- Objective evaluations --- p.108
Chapter 6.3.2 --- Extension on HD sequences --- p.112
Chapter 6.3.3 --- Subjective evaluations --- p.115
Chapter 6.4 --- Conclusions --- p.119
Chapter 7 --- Conclusions and future work --- p.120
Chapter 7.1 --- Contributions --- p.120
Chapter 7.1.1 --- 3D object positioning --- p.120
Chapter 7.1.2 --- Automatic human body detection --- p.120
Chapter 7.1.3 --- Human face recognition --- p.121
Chapter 7.1.4 --- Automatic human body segmentation --- p.121
Chapter 7.1.5 --- Arbitrarily shaped object coding --- p.121
Chapter 7.2 --- Future work --- p.122
Bibliography --- p.123
Hwang, Sung Ju. "Reading between the lines : object localization using implicit cues from image tags". Thesis, 2010. http://hdl.handle.net/2152/ETD-UT-2010-05-1514.
Texto completotext
Russa, Hélder Filipe de Sousa. "Computer Vision: Object recognition with deep learning applied to fashion items detection in images". Master's thesis, 2017. https://repositorio-aberto.up.pt/handle/10216/107862.
Texto completoRussa, Hélder Filipe de Sousa. "Computer Vision: Object recognition with deep learning applied to fashion items detection in images". Dissertação, 2017. https://repositorio-aberto.up.pt/handle/10216/107862.
Texto completo"Computer Vision from Spatial-Multiplexing Cameras at Low Measurement Rates". Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.45490.
Texto completoDissertation/Thesis
Doctoral Dissertation Electrical Engineering 2017
Larsson, Stefan y Filip Mellqvist. "Automatic Number Plate Recognition for Android". Thesis, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-72573.
Texto completoBALLAN, LAMBERTO. "Object and event recognition in multimedia archives using local visual features". Doctoral thesis, 2011. http://hdl.handle.net/2158/485661.
Texto completo