Dissertations / Theses on the topic 'Video analysis'

To see the other types of publications on this topic, follow the link: Video analysis.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Video analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Lidén, Jonas. "Distributed Video Content Analysis." Thesis, Umeå universitet, Institutionen för datavetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-99062.

Full text
Abstract:
Video Content Analysis (VCA) is usually computationally intense and time consuming. In this thesis the efficiency of VCA is increased by implementing a distributed VCA architecture. Automatic speech recognition is used as a case study to evaluate how the efficiency of VCA can be increased by distributing the workload across several machines. The system is to be run on standard desktop computers and need to support a variety of operating systems. The developed distributed system is compared to a serial system in use today. The results show increased performance, at the cost of a small increase in error rate. Two types of load balancing algorithms, static load balancing and dynamic load balancing, is evaluated in order to increase system throughput and it is concluded that the dynamic algorithm outperforms the static algorithm when running on a heterogeneous set of machines and that the differences are negligible when running on a homogeneous set of machines.
APA, Harvard, Vancouver, ISO, and other styles
2

Ren, Reede. "Audio-visual football video analysis, from structure detection to attention analysis." Thesis, Connect to e-thesis. Move to record for print version, 2008. http://theses.gla.ac.uk/77/.

Full text
Abstract:
Thesis (Ph.D.) - University of Glasgow, 2008.
Ph.D. thesis submitted to the Faculty of Information and Mathematical Sciences, Department of Computing Science, University of Glasgow, 2008. Includes bibliographical references. Print version also available.
APA, Harvard, Vancouver, ISO, and other styles
3

Pérez, Rúa Juan Manuel. "Hierarchical motion-based video analysis with applications to video post-production." Thesis, Rennes 1, 2017. http://www.theses.fr/2017REN1S125/document.

Full text
Abstract:
Nous présentons dans ce manuscrit les méthodes développées et les résultats obtenus dans notre travail de thèse sur l'analyse du contenu dynamique de scène visuelle. Nous avons considéré la configuration la plus fréquente de vision par ordinateur, à savoir caméra monoculaire et vidéos naturelles de scène extérieure. Nous nous concentrons sur des problèmes importants généraux pour la vision par ordinateur et d'un intérêt particulier pour l'industrie cinématographique, dans le cadre de la post-production vidéo. Les problèmes abordés peuvent être regroupés en deux catégories principales, en fonction d'une interaction ou non avec les utilisateurs : l'analyse interactive du contenu vidéo et l'analyse vidéo entièrement automatique. Cette division est un peu schématique, mais elle est en fait liée aux façons dont les méthodes proposées sont utilisées en post-production vidéo. Ces deux grandes approches correspondent aux deux parties principales qui forment ce manuscrit, qui sont ensuite subdivisées en chapitres présentant les différentes méthodes que nous avons proposées. Néanmoins, un fil conducteur fort relie toutes nos contributions. Il s'agit d'une analyse hiérarchique compositionnelle du mouvement dans les scènes dynamiques. Nous motivons et expliquons nos travaux selon l'organisation du manuscrit résumée ci-dessous. Nous partons de l'hypothèse fondamentale de la présence d'une structure hiérarchique de mouvement dans la scène observée, avec un objectif de compréhension de la scène dynamique. Cette hypothèse s'inspire d'un grand nombre de recherches scientifiques sur la vision biologique et cognitive. Plus précisément, nous nous référons à la recherche sur la vision biologique qui a établi la présence d'unités sensorielles liées au mouvement dans le cortex visuel. La découverte de ces unités cérébrales spécialisées a motivé les chercheurs en vision cognitive à étudier comment la locomotion des animaux (évitement des obstacles, planification des chemins, localisation automatique) et d'autres tâches de niveau supérieur sont directement influencées par les perceptions liées aux mouvements. Fait intéressant, les réponses perceptuelles qui se déroulent dans le cortex visuel sont activées non seulement par le mouvement lui-même, mais par des occlusions, des désocclusions, une composition des mouvements et des contours mobiles. En outre, la vision cognitive a relié la capacité du cerveau à appréhender la nature compositionnelle du mouvement dans l'information visuelle à une compréhension de la scène de haut niveau, comme la segmentation et la reconnaissance d'objets
The manuscript that is presented here contains all the findings and conclusions of the carried research in dynamic visual scene analysis. To be precise, we consider the ubiquitous monocular camera computer vision set-up, and the natural unconstrained videos that can be produced by it. In particular, we focus on important problems that are of general interest for the computer vision literature, and of special interest for the film industry, in the context of the video post-production pipeline. The tackled problems can be grouped in two main categories, according to the whether they are driven user interaction or not : user-assisted video processing tools and unsupervised tools for video analysis. This division is rather synthetic but it is in fact related to the ways the proposed methods are used inside the video post-production pipeline. These groups correspond to the main parts that form this manuscript, which are subsequently formed by chapters that explain our proposed methods. However, a single thread ties together all of our findings. This is, a hierarchical analysis of motion composition in dynamic scenes. We explain our exact contributions, together with our main motivations, and results in the following sections. We depart from a hypothesis that links the ability to consider a hierarchical structure of scene motion, with a deeper level of dynamic scene understanding. This hypothesis is inspired by plethora of scientific research in biological and psychological vision. More specifically, we refer to the biological vision research that established the presence of motion-related sensory units in the visual cortex. The discovery of these specialized brain units motivated psychological vision researchers to investigate how animal locomotion (obstacle avoidance, path planning, self-localization) and other higher-level tasks are directly influenced by motion-related percepts. Interestingly, the perceptual responses that take place in the visual cortex are activated not only by motion itself, but by occlusions, dis-occlusions, motion composition, and moving edges. Furthermore, psychological vision have linked the brain's ability to understand motion composition from visual information to high level scene understanding like object segmentation and recognition
APA, Harvard, Vancouver, ISO, and other styles
4

Touliatou, Georgia. "Diegetic stories in a video mediation : a narrative analysis of four videos." Thesis, University of Surrey, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.397132.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Park, Dong-Jun. "Video event detection framework on large-scale video data." Diss., University of Iowa, 2011. https://ir.uiowa.edu/etd/2754.

Full text
Abstract:
Detection of events and actions in video entails substantial processing of very large, even open-ended, video streams. Video data presents a unique challenge for the information retrieval community because properly representing video events is challenging. We propose a novel approach to analyze temporal aspects of video data. We consider video data as a sequence of images that form a 3-dimensional spatiotemporal structure, and perform multiview orthographic projection to transform the video data into 2-dimensional representations. The projected views allow a unique way to rep- resent video events and capture the temporal aspect of video data. We extract local salient points from 2D projection views and perform detection-via-similarity approach on a wide range of events against real-world surveillance data. We demonstrate our example-based detection framework is competitive and robust. We also investigate the synthetic example driven retrieval as a basis for query-by-example.
APA, Harvard, Vancouver, ISO, and other styles
6

Bales, Michael Ryan. "Illumination compensation in video surveillance analysis." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/39535.

Full text
Abstract:
Problems in automated video surveillance analysis caused by illumination changes are explored, and solutions are presented. Controlled experiments are first conducted to measure the responses of color targets to changes in lighting intensity and spectrum. Surfaces of dissimilar color are found to respond significantly differently. Illumination compensation model error is reduced by 70% to 80% by individually optimizing model parameters for each distinct color region, and applying a model tuned for one region to a chromatically different region increases error by a factor of 15. A background model--called BigBackground--is presented to extract large, stable, chromatically self-similar background features by identifying the dominant colors in a scene. The stability and chromatic diversity of these features make them useful reference points for quantifying illumination changes. The model is observed to cover as much as 90% of a scene, and pixels belonging to the model are 20% more stable on average than non-member pixels. Several illumination compensation techniques are developed to exploit BigBackground, and are compared with several compensation techniques from the literature. Techniques are compared in terms of foreground / background classification, and are applied to an object tracking pipeline with kinematic and appearance-based correspondence mechanisms. Compared with other techniques, BigBackground-based techniques improve foreground classification by 25% to 43%, improve tracking accuracy by an average of 20%, and better preserve object appearance for appearance-based trackers. All algorithms are implemented in C or C++ to support the consideration of runtime performance. In terms of execution speed, the BigBackground-based illumination compensation technique is measured to run on par with the simplest compensation technique used for comparison, and consistently achieves twice the frame rate of the two next-fastest techniques.
APA, Harvard, Vancouver, ISO, and other styles
7

Almquist, Mathias, and Viktor Almquist. "Analysis of 360° Video Viewing Behaviour." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-144405.

Full text
Abstract:
In this thesis we study users' viewing motions when watching 360° videos in order to provide information that can be used to optimize future view-dependent streaming protocols. More specifically, we develop an application that plays a sequence of 360° videos on an Oculus Rift Head Mounted Display and records the orientation and rotation velocity of the headset during playback. The application is used during an extensive user study in order to collect more than 21 hours of viewing data which is then analysed to expose viewing patterns, useful for optimizing 360° streaming protocols.
APA, Harvard, Vancouver, ISO, and other styles
8

Gu, Lifang. "Video analysis in MPEG compressed domain." University of Western Australia. School of Computer Science and Software Engineering, 2003. http://theses.library.uwa.edu.au/adt-WU2003.0016.

Full text
Abstract:
The amount of digital video has been increasing dramatically due to the technology advances in video capturing, storage, and compression. The usefulness of vast repositories of digital information is limited by the effectiveness of the access methods, as shown by the Web explosion. The key issues in addressing the access methods are those of content description and of information space navigation. While textual documents in digital form are somewhat self-describing (i.e., they provide explicit indices, such as words and sentences that can be directly used to categorise and access them), digital video does not provide such an explicit content description. In order to access video material in an effective way, without looking at the material in its entirety, it is therefore necessary to analyse and annotate video sequences, and provide an explicit content description targeted to the user needs. Digital video is a very rich medium, and the characteristics in which users may be interested are quite diverse, ranging from the structure of the video to the identity of the people who appear in it, their movements and dialogues and the accompanying music and audio effects. Indexing digital video, based on its content, can be carried out at several levels of abstraction, beginning with indices like the video program name and name of subject, to much lower level aspects of video like the location of edits and motion properties of video. Manual video indexing requires the sequential examination of the entire video clip. This is a time-consuming, subjective, and expensive process. As a result, there is an urgent need for tools to automate the indexing process. In response to such needs, various video analysis techniques from the research fields of image processing and computer vision have been proposed to parse, index and annotate the massive amount of digital video data. However, most of these video analysis techniques have been developed for uncompressed video. Since most video data are stored in compressed formats for efficiency of storage and transmission, it is necessary to perform decompression on compressed video before such analysis techniques can be applied. Two consequences of having to first decompress before processing are incurring computation time for decompression and requiring extra auxiliary storage.To save on the computational cost of decompression and lower the overall size of the data which must be processed, this study attempts to make use of features available in compressed video data and proposes several video processing techniques operating directly on compressed video data. Specifically, techniques of processing MPEG-1 and MPEG-2 compressed data have been developed to help automate the video indexing process. This includes the tasks of video segmentation (shot boundary detection), camera motion characterisation, and highlights extraction (detection of skin-colour regions, text regions, moving objects and replays) in MPEG compressed video sequences. The approach of performing analysis on the compressed data has the advantages of dealing with a much reduced data size and is therefore suitable for computationally-intensive low-level operations. Experimental results show that most analysis tasks for video indexing can be carried out efficiently in the compressed domain. Once intermediate results, which are dramatically reduced in size, are obtained from the compressed domain analysis, partial decompression can be applied to enable high resolution processing to extract high level semantic information.
APA, Harvard, Vancouver, ISO, and other styles
9

Gu, Lifang. "Video analysis in MPEG compressed domain /." Connect to this title, 2002. http://theses.library.uwa.edu.au/adt-2003.0016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Li, Hao. "Advanced video analysis for surveillance applications." Thesis, University of Bristol, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.555815.

Full text
Abstract:
This thesis addresses the issues of applying advanced video analytics for surveillance applications. A video surveillance system can be defined as a technological tool that assists humans by providing an extended perception and capability of capturing interesting activities in the monitored scene. The prime components of video surveillance systems include moving object detection, object tracking, and anomaly detection. Moving object detection extracts the foreground silhouettes of moving objects. The object tracking component then applies the foreground information to create correspondences between tracks in the previous frame and objects in the current frame. The most challenging part of the system concerns the use of extracted scene information from the moving objects and object tracking for anomaly detection. The thesis proposes novel approaches for each of the main components above. They include: 1) an efficient foreground detection algorithm based on block-based detection and improved pixel-based Gaussian Mixture Model (GMM) refinement that can selectively update pixel information in each image region; 2) an adaptive object tracker that combines the merits of Kalman, mean-shift and particle filtering; 3) a feature clustering algorithm, which can automatically choose the optimal number of clusters in the training data for scene pattern classification; 4) a statistical scene modeller based on Bayesian theory and GMM, which combines object-based and local region-based information for enhanced anomaly detection. In addition, a layered feedback system architecture is proposed for using high- level detection results for improving low-level detection performance. Compared with common open-loop approaches, this increases the system reliability at the expense of using little extra computation. Moreover, considering the capability of real-time operation, robustness, and detection accuracy, which are key factors of video surveillance systems, appropriate trade-offs between complexity and detection performance are introduced in the relevant phases of the system, such as in moving object detection and in object tracking. The performance of the proposed system is evaluated with various video datasets. Both qualitative and quantitative measures are applied, for example visual comparison and precision-recall curves. The proposed moving object detection achieves an average of 52% and 38% improvement in terms of false positive detected pixels compared with a Gaussian Model (GM) and a GMM respectively. The object tracking component reduces the computation by 10% compared to a mean-shift filter while maintaining better tracking results. The proposed anomaly detection algorithm also outperforms previously proposed approaches. These results demonstrate the effectiveness of the proposed video surveillance system framework.
APA, Harvard, Vancouver, ISO, and other styles
11

Plakas, Konstantinos. "Video sequence analysis for subsea robotics." Thesis, Heriot-Watt University, 2001. http://hdl.handle.net/10399/1186.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Chan, Stephen Chi Yee. "Video analysis for content-based applications." Thesis, University of Southampton, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.395362.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Almquist, Mathias, and Viktor Almquist. "Analysis of 360° Video Viewing Behaviours." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-144907.

Full text
Abstract:
In this thesis we study users' viewing motions when watching 360° videos in order to provide information that can be used to optimize future view-dependent streaming protocols. More specifically, we develop an application that plays a sequence of 360° videos on an Oculus Rift Head Mounted Display and records the orientation and rotation velocity of the headset during playback. The application is used during an extensive user study in order to collect more than 21 hours of viewing data which is then analysed to expose viewing patterns, useful for optimizing 360° streaming protocols.
APA, Harvard, Vancouver, ISO, and other styles
14

Whitmore, Jean. "Video Magnification for Structural Analysis Testing." DigitalCommons@CalPoly, 2018. https://digitalcommons.calpoly.edu/theses/1863.

Full text
Abstract:
The goal of this thesis is to allow a user to see minute motion of an object at different frequencies, using a computer program, to aid in vibration testing analysis without the use of complex setups of accelerometers or expensive laser vibrometers. MIT’s phase-based video motion processing ­was modified to enable modal determination of structures in the field using a cell phone camera. The algorithm was modified by implementing a stabilization algorithm and permitting the magnification filter to operate on multiple frequency ranges to enable visualization of the natural frequencies of structures in the field. To implement multiple frequency ranges a new function was developed to implement the magnification filter at each relevant frequency range within the original video. The stabilization algorithm would allow for a camera to be hand-held instead of requiring a tripod mount. The following methods for stabilization were tested: fixed point video stabilization and image registration. Neither method removed the global motion from the hand-held video, even after masking was implemented, which resulted in poor results. Specifically, fixed point did not remove much motion or created sharp motions and image registration introduced a pulsing effect. The best results occurred when the object being observed had contrast from the background, was the largest feature in the video frame, and the video was captured from a tripod at an appropriate angle. The final program can amplify the motion in user selected frequency bands and can be used as an aid in structural analysis testing.
APA, Harvard, Vancouver, ISO, and other styles
15

Baradel, Fabien. "Structured deep learning for video analysis." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI045.

Full text
Abstract:
Avec l’augmentation massive du contenu vidéo sur Internet et au-delà, la compréhension automatique du contenu visuel pourrait avoir un impact sur de nombreux domaines d’application différents tels que la robotique, la santé, la recherche de contenu ou le filtrage. Le but de cette thèse est de fournir des contributions méthodologiques en vision par ordinateur et apprentissage statistique pour la compréhension automatique du contenu des vidéos. Nous mettons l’accent sur les problèmes de la reconnaissance de l’action humaine à grain fin et du raisonnement visuel à partir des interactions entre objets. Dans la première partie de ce manuscrit, nous abordons le problème de la reconnaissance fine de l’action humaine. Nous introduisons deux différents mécanismes d’attention, entrainés sur le contenu visuel à partir de la pose humaine articulée. Une première méthode est capable de porter automatiquement l’attention sur des points pré-sélectionnés importants de la vidéo, conditionnés sur des caractéristiques apprises extraites de la pose humaine articulée. Nous montrons qu’un tel mécanisme améliore les performances sur la tâche finale et fournit un bon moyen de visualiser les parties les plus discriminantes du contenu visuel. Une deuxième méthode va au-delà de la reconnaissance de l’action humaine basée sur la pose. Nous développons une méthode capable d’identifier automatiquement un nuage de points caractéristiques non structurés pour une vidéo à l’aide d’informations contextuelles. De plus, nous introduisons un système distribué entrainé pour agréger les caractéristiques de manière récurrente et prendre des décisions de manière distribuée. Nous démontrons que nous pouvons obtenir de meilleures performances que celles illustrées précédemment, sans utiliser d’informations de pose articulée au moment de l’inférence. Dans la deuxième partie de cette thèse, nous étudions les représentations vidéo d’un point de vue objet. Étant donné un ensemble de personnes et d’objets détectés dans la scène, nous développons une méthode qui a appris à déduire les interactions importantes des objets à travers l’espace et le temps en utilisant uniquement l’annotation au niveau vidéo. Cela permet d’identifier une interaction inter-objet importante pour une action donnée ainsi que le biais potentiel d’un ensemble de données. Enfin, dans une troisième partie, nous allons au-delà de la tâche de classification et d’apprentissage supervisé à partir de contenus visuels, en abordant la causalité à travers les interactions, et en particulier le problème de l’apprentissage contrefactuel. Nous introduisons une nouvelle base de données, à savoir CoPhy, où, après avoir regardé une vidéo, la tâche consiste à prédire le résultat après avoir modifié la phase initiale de la vidéo. Nous développons une méthode basée sur des interactions au niveau des objets capables d’inférer les propriétés des objets sans supervision ainsi que les emplacements futurs des objets après l’intervention
With the massive increase of video content on Internet and beyond, the automatic understanding of visual content could impact many different application fields such as robotics, health care, content search or filtering. The goal of this thesis is to provide methodological contributions in Computer Vision and Machine Learning for automatic content understanding from videos. We emphasis on problems, namely fine-grained human action recognition and visual reasoning from object-level interactions. In the first part of this manuscript, we tackle the problem of fine-grained human action recognition. We introduce two different trained attention mechanisms on the visual content from articulated human pose. The first method is able to automatically draw attention to important pre-selected points of the video conditioned on learned features extracted from the articulated human pose. We show that such mechanism improves performance on the final task and provides a good way to visualize the most discriminative parts of the visual content. The second method goes beyond pose-based human action recognition. We develop a method able to automatically identify unstructured feature clouds of interest in the video using contextual information. Furthermore, we introduce a learned distributed system for aggregating the features in a recurrent manner and taking decisions in a distributed way. We demonstrate that we can achieve a better performance than obtained previously, without using articulated pose information at test time. In the second part of this thesis, we investigate video representations from an object-level perspective. Given a set of detected persons and objects in the scene, we develop a method which learns to infer the important object interactions through space and time using the video-level annotation only. That allows to identify important objects and object interactions for a given action, as well as potential dataset bias. Finally, in a third part, we go beyond the task of classification and supervised learning from visual content by tackling causality in interactions, in particular the problem of counterfactual learning. We introduce a new benchmark, namely CoPhy, where, after watching a video, the task is to predict the outcome after modifying the initial stage of the video. We develop a method based on object- level interactions able to infer object properties without supervision as well as future object locations after the intervention
APA, Harvard, Vancouver, ISO, and other styles
16

Fraz, Muhammad. "Video content analysis for intelligent forensics." Thesis, Loughborough University, 2014. https://dspace.lboro.ac.uk/2134/18065.

Full text
Abstract:
The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild.
APA, Harvard, Vancouver, ISO, and other styles
17

Al, Hajj Hassan. "Video analysis for augmented cataract surgery." Thesis, Brest, 2018. http://www.theses.fr/2018BRES0041/document.

Full text
Abstract:
L’ère numérique change de plus en plus le monde en raison de la quantité de données récoltées chaque jour. Le domaine médical est fortement affecté par cette explosion, car l’exploitation de ces données est un véritable atout pour l’aide à la pratique médicale. Dans cette thèse, nous proposons d’utiliser les vidéos chirurgicales dans le but de créer un système de chirurgie assistée par ordinateur. Nous nous intéressons principalement à reconnaître les gestes chirurgicaux à chaque instant afin de fournir aux chirurgiens des recommandations et des informations pertinentes. Pour ce faire, l’objectif principal de cette thèse est de reconnaître les outils chirurgicaux dans les vidéos de chirurgie de la cataracte. Dans le flux vidéo du microscope, ces outils sont partiellement visibles et certains se ressemblent beaucoup. Pour relever ces défis, nous proposons d'ajouter une caméra supplémentaire filmant la table opératoire. Notre objectif est donc de détecter la présence des outils dans les deux types de flux vidéo : les vidéos du microscope et les vidéos de la table opératoire. Le premier enregistre l'oeil du patient et le second enregistre les activités de la table opératoire. Deux tâches sont proposées pour détecter les outils dans les vidéos de la table : la détection des changements et la détection de présence d'outil. Dans un premier temps, nous proposons un système similaire pour ces deux tâches. Il est basé sur l’extraction des caractéristiques visuelles avec des méthodes de classification classique. Il fournit des résultats satisfaisants pour la détection de changement, cependant, il fonctionne insuffisamment bien pour la tâche de détection de présence des outils sur la table. Dans un second temps, afin de résoudre le problème du choix des caractéristiques, nous utilisons des architectures d’apprentissage profond pour la détection d'outils chirurgicaux sur les deux types de vidéo. Pour surmonter les défis rencontrés dans les vidéos de la table, nous proposons de générer des vidéos artificielles imitant la scène de la table opératoire et d’utiliser un réseau de neurones à convolutions (CNN) à base de patch. Enfin, nous exploitons l'information temporelle en utilisant un réseau de neurones récurrent analysant les résultats de CNNs. Contrairement à notre hypothèse, les expérimentations montrent des résultats insuffisants pour la détection de présence des outils sur la table, mais de très bons résultats dans les vidéos du microscope. Nous obtenons des résultats encore meilleurs dans les vidéos du microscope après avoir fusionné l’information issue de la détection des changements sur la table et la présence des outils dans l’oeil
The digital era is increasingly changing the world due to the sheer volume of data produced every day. The medical domain is highly affected by this revolution, because analysing this data can be a source of education/support for the clinicians. In this thesis, we propose to reuse the surgery videos recorded in the operating rooms for computer-assisted surgery system. We are chiefly interested in recognizing the surgical gesture being performed at each instant in order to provide relevant information. To achieve this goal, this thesis addresses the surgical tool recognition problem, with applications in cataract surgery. The main objective of this thesis is to address the surgical tool recognition problem in cataract surgery videos.In the surgical field, those tools are partially visible in videos and highly similar to one another. To address the visual challenges in the cataract surgical field, we propose to add an additional camera filming the surgical tray. Our goal is to detect the tool presence in the two complementary types of videos: tool-tissue interaction and surgical tray videos. The former records the patient's eye and the latter records the surgical tray activities.Two tasks are proposed to perform the task on the surgical tray videos: tools change detection and tool presence detection.First, we establish a similar pipeline for both tasks. It is based on standard classification methods on top of visual learning features. It yields satisfactory results for the tools change task, howev-lateer, it badly performs the surgical tool presence task on the tray. Second, we design deep learning architectures for the surgical tool detection on both video types in order to address the difficulties in manually designing the visual features.To alleviate the inherent challenges on the surgical tray videos, we propose to generate simulated surgical tray scenes along with a patch-based convolutional neural network (CNN).Ultimately, we study the temporal information using RNN processing the CNN results. Contrary to our primary hypothesis, the experimental results show deficient results for surgical tool presence on the tray but very good results on the tool-tissue interaction videos. We achieve even better results in the surgical field after fusing the tool change information coming from the tray and tool presence signals on the tool-tissue interaction videos
APA, Harvard, Vancouver, ISO, and other styles
18

Stobaugh, John David. "Novel use of video and image analysis in a video compression system." Thesis, University of Iowa, 2015. https://ir.uiowa.edu/etd/1766.

Full text
Abstract:
As consumer demand for higher quality video at lower bit-rate increases, so does the need for more sophisticated methods of compressing videos into manageable file sizes. This research attempts to address these concerns while still maintaining reasonable encoding times. Modern segmentation and grouping analysis are used with code vectorization techniques and other optimization paradigms to improve quality and performance within the next generation coding standard, High Efficiency Video Coding. This research saw on average a 50% decrease in run-time by the encoder with marginal decreases in perceived quality.
APA, Harvard, Vancouver, ISO, and other styles
19

Dye, Brigham R. "Reliability of Pre-Service Teachers Coding of Teaching Videos Using Video-Annotation Tools." BYU ScholarsArchive, 2007. https://scholarsarchive.byu.edu/etd/990.

Full text
Abstract:
Teacher education programs that aspire to helping pre-service teachers develop expertise must help students engage in deliberate practice along dimensions of teaching expertise. However, field teaching experiences often lack the quantity and quality of feedback that is needed to help students engage in meaningful teaching practice. The limited availability of supervising teachers makes it difficult to personally observe and evaluate each student teacher's field teaching performances. Furthermore, when a supervising teacher debriefs such an observation, the supervising teacher and student may struggle to communicate meaningfully about the teaching performance. This is because the student teacher and supervisor often have very different perceptions of the same teaching performance. Video analysis tools show promise for improving the quality of feedback student teachers receive in their teaching performance by providing a common reference for evaluative debriefing and allowing students to generate their own feedback by coding videos of their own teaching. This study investigates the reliability of pre-service teacher coding using a video analysis tool. This study found that students were moderately reliable coders when coding video of an expert teacher (49%-68%). However, when the reliability of student coding of their own teaching videos was audited, students showed a high degree of accuracy (91%). These contrasting findings suggest that coding reliability scores may not be simple indicators of student understanding of the teaching competencies represented by a coding scheme. Instead, reliability scores may also be subject to the influence of extraneous factors. For example, reliability scores in this study were influenced by differences in the technical aspects of how students implemented the coding system. Furthermore, reliability scores were influenced by how coding proficiency was measured. Because this study also suggests that students can be taught to improve their coding reliability, further research may improve reliability scores"-and make them a more valid reflection of student understanding of teaching competency-"by training students about the technical aspects of implementing a coding system.
APA, Harvard, Vancouver, ISO, and other styles
20

Monger, Eloise. "'Video-View-Point' : video analysis to reveal tacit indicators of student nurse competence." Thesis, University of Southampton, 2014. https://eprints.soton.ac.uk/366452/.

Full text
Abstract:
For over 30 years, the assessment of the clinical competence of student nurses has been the subject of much theoretical debate, yet the definition of criteria based on observable indicators of competence remains problematic. In practice, however, different assessors will judge and agree, relatively quickly, whether a student is competent or not; whether they have got ‘it’. Articulating what ‘it’ is, is difficult; although ‘it’ appears to be collectively, yet tacitly, understood. These judgements provide the key to the definition of competence. This research solves the dilemma of revealing and investigating these tacit understandings through the video analysis of students in simulated practice. The findings of four initial exploratory studies confirmed that competence is an example of tacitly understood behaviour and identified the limitations of traditional research methods in this context. The practical challenges of analysing video were highlighted, leading to the development of Video-View-Point to solve these problems and to reveal the tacitly understood behaviours. This innovative hybrid research method combines analysis of multiple ‘Think Aloud’ commentaries with the ability to ‘point’ at the subject of interest. The analysis is presented as a time-stamped multimedia dialectic, a visually simple yet sophisticated collage of data which reveals relevant behaviours, including those which are tacitly understood. A bespoke software tool (BigSister) was designed to facilitate the data collection, and was tested against the most similar commercially available technology, an eye tracker. The test of Video-View-Point successfully revealed four tacitly understood indicators of competence: communication, processing clinical information, being in the right place, and being proactive. Video-View-Point offers huge potential for behavioural analysis in other domains.
APA, Harvard, Vancouver, ISO, and other styles
21

Tripp, Tonya R. "The Influence of Video Analysis on Teaching." BYU ScholarsArchive, 2010. https://scholarsarchive.byu.edu/etd/2562.

Full text
Abstract:
As video has become more accessible, there has been an increase in the use of video for teacher reflection. Although past studies have investigated the use of video for teacher reflection, there is not a review of practices and processes for effective use of video analysis. The first article in this dissertation reviews 52 studies where teachers used video to reflect on their teaching. Most studies included in the review reported that video was a beneficial feedback method for teachers. However, few studies discussed how video encourages teachers to change their practices. The second article in this dissertations investigates the how video influences the teacher change process. The study found that teachers did change their practices as a result of using video analysis. Teachers reported that video analysis encouraged them to change because they were able to: (a) focus their analysis, (b) see their teaching from a new perspective, (c) feel accountable to change their practice, (d) remember to implement changes, and (e) see their progress.
APA, Harvard, Vancouver, ISO, and other styles
22

Yoon, Kyongil. "Key-frame appearance analysis for video surveillance." College Park, Md. : University of Maryland, 2005. http://hdl.handle.net/1903/2818.

Full text
Abstract:
Thesis (Ph. D.) -- University of Maryland, College Park, 2005.
Thesis research directed by: Computer Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
APA, Harvard, Vancouver, ISO, and other styles
23

Wang, Ying. "Analysis Application for H.264 Video Encoding." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-133633.

Full text
Abstract:
A video analysis application ERANA264 (Ericsson Research h.264 videoANalysis Application) is developed in this project. Erana264 is a tool that analyzes H.264 encoded video bit streams, extracts the encoding information and parameters, analyzes them in different stages and displays the results in a user friendly way. The intention is that such an application would be used during development and testing of video codecs. The work is implemented on top of existing H.264 encoder/decoder source code (C/C++) developed at Ericsson Research. Erana264 consists of three layers. The first layer is the H.264 decoder previously developed in Ericsson Research. By using the decoder APIs, the information is extracted from the bit stream and is sent to the higher layers. The second layer visualizes the different decoding stages, uses overlay to display some macro block and picture level information and provides a set of play back functions. The third layer analyzes and presents the statistics of prominent parameters in video compression process, such as video quality measurements, motion vector distribution, picture bit distribution etc.
APA, Harvard, Vancouver, ISO, and other styles
24

Weir, Lindsay Brian. "Digital video for time based analysis systems." Thesis, University of Canterbury. Computer Science, 1994. http://hdl.handle.net/10092/9406.

Full text
Abstract:
Research students within the Psychology Department at the University of Canterbury are involved in exploring emotional responses of human behaviour. Experiments of subjects are video taped and the tapes are subsequently analysed using pen and paper. This approach is time consuming and provides a relatively crude interface for analysis. In this thesis techniques to assist in the analysis of time dependent information are examined in general, although the emphasis is on human behaviour experiments. Digital video analysis methods are examined to evaluate their strengths and weaknesses in comparison to video tape methods. A working prototype system, Video Transcriptor, has been developed on a Macintosh computer in order to evaluate how digital video can assist in analysing human behaviour. This prototype system uses the facilities of QuickTime, Apple's solution to handling time based digital video information. There is a lack of standards for controlling digital video information, so an analysis of various Human-Computer Interface metaphors has been explored. For transcription purposes, an adaptive note-taking facility has been implemented to assist in the analysis of human behaviour. This thesis shows the benefits that digital video provides for the analysis and note-taking of human behaviours compared to video tape methods. The random access capabilities of digital video offer increased control of the video information, which provides faster note-taking and more accurate results compared to video tape based methods of analysis.
APA, Harvard, Vancouver, ISO, and other styles
25

Steinmetz, Nadine. "Context-aware semantic analysis of video metadata." Phd thesis, Universität Potsdam, 2013. http://opus.kobv.de/ubp/volltexte/2014/7055/.

Full text
Abstract:
Im Vergleich zu einer stichwortbasierten Suche ermöglicht die semantische Suche ein präziseres und anspruchsvolleres Durchsuchen von (Web)-Dokumenten, weil durch die explizite Semantik Mehrdeutigkeiten von natürlicher Sprache vermieden und semantische Beziehungen in das Suchergebnis einbezogen werden können. Eine semantische, Entitäten-basierte Suche geht von einer Anfrage mit festgelegter Bedeutung aus und liefert nur Dokumente, die mit dieser Entität annotiert sind als Suchergebnis. Die wichtigste Voraussetzung für eine Entitäten-zentrierte Suche stellt die Annotation der Dokumente im Archiv mit Entitäten und Kategorien dar. Textuelle Informationen werden analysiert und mit den entsprechenden Entitäten und Kategorien versehen, um den Inhalt semantisch erschließen zu können. Eine manuelle Annotation erfordert Domänenwissen und ist sehr zeitaufwendig. Die semantische Annotation von Videodokumenten erfordert besondere Aufmerksamkeit, da inhaltsbasierte Metadaten von Videos aus verschiedenen Quellen stammen, verschiedene Eigenschaften und Zuverlässigkeiten besitzen und daher nicht wie Fließtext behandelt werden können. Die vorliegende Arbeit stellt einen semantischen Analyseprozess für Video-Metadaten vor. Die Eigenschaften der verschiedenen Metadatentypen werden analysiert und ein Konfidenzwert ermittelt. Dieser Wert spiegelt die Korrektheit und die wahrscheinliche Mehrdeutigkeit eines Metadatums wieder. Beginnend mit dem Metadatum mit dem höchsten Konfidenzwert wird der Analyseprozess innerhalb eines Kontexts in absteigender Reihenfolge des Konfidenzwerts durchgeführt. Die bereits analysierten Metadaten dienen als Referenzpunkt für die weiteren Analysen. So kann eine möglichst korrekte Analyse der heterogen strukturierten Daten eines Kontexts sichergestellt werden. Am Ende der Analyse eines Metadatums wird die für den Kontext relevanteste Entität aus einer Liste von Kandidaten identifiziert - das Metadatum wird disambiguiert. Hierfür wurden verschiedene Disambiguierungsalgorithmen entwickelt, die Beschreibungstexte und semantische Beziehungen der Entitätenkandidaten zum gegebenen Kontext in Betracht ziehen. Der Kontext für die Disambiguierung wird für jedes Metadatum anhand der Eigenschaften und Konfidenzwerte zusammengestellt. Der vorgestellte Analyseprozess ist an zwei Hypothesen angelehnt: Um die Analyseergebnisse verbessern zu können, sollten die Metadaten eines Kontexts in absteigender Reihenfolge ihres Konfidenzwertes verarbeitet werden und die Kontextgrenzen von Videometadaten sollten durch Segmentgrenzen definiert werden, um möglichst Kontexte mit kohärentem Inhalt zu erhalten. Durch ausführliche Evaluationen konnten die gestellten Hypothesen bestätigt werden. Der Analyseprozess wurden gegen mehrere State-of-the-Art Methoden verglichen und erzielt verbesserte Ergebnisse in Bezug auf Recall und Precision, besonders für Metadaten, die aus weniger zuverlässigen Quellen stammen. Der Analyseprozess ist Teil eines Videoanalyse-Frameworks und wurde bereits erfolgreich in verschiedenen Projekten eingesetzt.
The Semantic Web provides information contained in the World Wide Web as machine-readable facts. In comparison to a keyword-based inquiry, semantic search enables a more sophisticated exploration of web documents. By clarifying the meaning behind entities, search results are more precise and the semantics simultaneously enable an exploration of semantic relationships. However, unlike keyword searches, a semantic entity-focused search requires that web documents are annotated with semantic representations of common words and named entities. Manual semantic annotation of (web) documents is time-consuming; in response, automatic annotation services have emerged in recent years. These annotation services take continuous text as input, detect important key terms and named entities and annotate them with semantic entities contained in widely used semantic knowledge bases, such as Freebase or DBpedia. Metadata of video documents require special attention. Semantic analysis approaches for continuous text cannot be applied, because information of a context in video documents originates from multiple sources possessing different reliabilities and characteristics. This thesis presents a semantic analysis approach consisting of a context model and a disambiguation algorithm for video metadata. The context model takes into account the characteristics of video metadata and derives a confidence value for each metadata item. The confidence value represents the level of correctness and ambiguity of the textual information of the metadata item. The lower the ambiguity and the higher the prospective correctness, the higher the confidence value. The metadata items derived from the video metadata are analyzed in a specific order from high to low confidence level. Previously analyzed metadata are used as reference points in the context for subsequent disambiguation. The contextually most relevant entity is identified by means of descriptive texts and semantic relationships to the context. The context is created dynamically for each metadata item, taking into account the confidence value and other characteristics. The proposed semantic analysis follows two hypotheses: metadata items of a context should be processed in descendent order of their confidence value, and the metadata that pertains to a context should be limited by content-based segmentation boundaries. The evaluation results support the proposed hypotheses and show increased recall and precision for annotated entities, especially for metadata that originates from sources with low reliability. The algorithms have been evaluated against several state-of-the-art annotation approaches. The presented semantic analysis process is integrated into a video analysis framework and has been successfully applied in several projects for the purpose of semantic video exploration of videos.
APA, Harvard, Vancouver, ISO, and other styles
26

Mackiewicz, Michał. "Computer-assisted wireless capsule endoscopy video analysis." Thesis, University of East Anglia, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445207.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Xu, Xun. "Semantic spaces for video analysis of behaviour." Thesis, Queen Mary, University of London, 2016. http://qmro.qmul.ac.uk/xmlui/handle/123456789/23885.

Full text
Abstract:
There are ever growing interests from the computer vision community into human behaviour analysis based on visual sensors. These interests generally include: (1) behaviour recognition - given a video clip or specific spatio-temporal volume of interest discriminate it into one or more of a set of pre-defined categories; (2) behaviour retrieval - given a video or textual description as query, search for video clips with related behaviour; (3) behaviour summarisation - given a number of video clips, summarise out representative and distinct behaviours. Although countless efforts have been dedicated into problems mentioned above, few works have attempted to analyse human behaviours in a semantic space. In this thesis, we define semantic spaces as a collection of high-dimensional Euclidean space in which semantic meaningful events, e.g. individual word, phrase and visual event, can be represented as vectors or distributions which are referred to as semantic representations. With the semantic space, semantic texts, visual events can be quantitatively compared by inner product, distance and divergence. The introduction of semantic spaces can bring lots of benefits for visual analysis. For example, discovering semantic representations for visual data can facilitate semantic meaningful video summarisation, retrieval and anomaly detection. Semantic space can also seamlessly bridge categories and datasets which are conventionally treated independent. This has encouraged the sharing of data and knowledge across categories and even datasets to improve recognition performance and reduce labelling effort. Moreover, semantic space has the ability to generalise learned model beyond known classes which is usually referred to as zero-shot learning. Nevertheless, discovering such a semantic space is non-trivial due to (1) semantic space is hard to define manually. Humans always have a good sense of specifying the semantic relatedness between visual and textual instances. But a measurable and finite semantic space can be difficult to construct with limited manual supervision. As a result, constructing semantic space from data is adopted to learn in an unsupervised manner; (2) It is hard to build a universal semantic space, i.e. this space is always contextual dependent. So it is important to build semantic space upon selected data such that it is always meaningful within the context. Even with a well constructed semantic space, challenges are still present including; (3) how to represent visual instances in the semantic space; and (4) how to mitigate the misalignment of visual feature and semantic spaces across categories and even datasets when knowledge/data are generalised. This thesis tackles the above challenges by exploiting data from different sources and building contextual semantic space with which data and knowledge can be transferred and shared to facilitate the general video behaviour analysis. To demonstrate the efficacy of semantic space for behaviour analysis, we focus on studying real world problems including surveillance behaviour analysis, zero-shot human action recognition and zero-shot crowd behaviour recognition with techniques specifically tailored for the nature of each problem. Firstly, for video surveillances scenes, we propose to discover semantic representations from the visual data in an unsupervised manner. This is due to the largely availability of unlabelled visual data in surveillance systems. By representing visual instances in the semantic space, data and annotations can be generalised to new events and even new surveillance scenes. Specifically, to detect abnormal events this thesis studies a geometrical alignment between semantic representation of events across scenes. Semantic actions can be thus transferred to new scenes and abnormal events can be detected in an unsupervised way. To model multiple surveillance scenes simultaneously, we show how to learn a shared semantic representation across a group of semantic related scenes through a multi-layer clustering of scenes. With multi-scene modelling we show how to improve surveillance tasks including scene activity profiling/understanding, crossscene query-by-example, behaviour classification, and video summarisation. Secondly, to avoid extremely costly and ambiguous video annotating, we investigate how to generalise recognition models learned from known categories to novel ones, which is often termed as zero-shot learning. To exploit the limited human supervision, e.g. category names, we construct the semantic space via a word-vector representation trained on large textual corpus in an unsupervised manner. Representation of visual instance in semantic space is obtained by learning a visual-to-semantic mapping. We notice that blindly applying the mapping learned from known categories to novel categories can cause bias and deteriorating the performance which is termed as domain shift. To solve this problem we employed techniques including semisupervised learning, self-training, hubness correction, multi-task learning and domain adaptation. All these methods in combine achieve state-of-the-art performance in zero-shot human action task. In the last, we study the possibility to re-use known and manually labelled semantic crowd attributes to recognise rare and unknown crowd behaviours. This task is termed as zero-shot crowd behaviours recognition. Crucially we point out that given the multi-labelled nature of semantic crowd attributes, zero-shot recognition can be improved by exploiting the co-occurrence between attributes. To summarise, this thesis studies methods for analysing video behaviours and demonstrates that exploring semantic spaces for video analysis is advantageous and more importantly enables multi-scene analysis and zero-shot learning beyond conventional learning strategies.
APA, Harvard, Vancouver, ISO, and other styles
28

Ilisescu, Corneliu. "Analysis and synthesis of interactive video sprites." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10045947/.

Full text
Abstract:
In this thesis, we explore how video, an extremely compelling medium that is traditionally consumed passively, can be transformed into interactive experiences and what is preventing content creators from using it for this purpose. Film captures extremely rich and dynamic information but, due to the sheer amount of data and the drastic change in content appearance over time, it is problematic to work with. Content creators are willing to invest time and effort to design and capture video so why not for manipulating and interacting with it? We hypothesize that people can help and be helped by automatic video processing and synthesis algorithms when they are given the right tools. Computer games are a very popular interactive media where players engage with dynamic content in compelling and intuitive ways. The first contribution of this thesis is an in-depth exploration of the modes of interaction that enable game-like video experiences. Through active discussions with game developers, we identify both how to assist content creators and how their creation can be dynamically interacted with by players. We present concepts, explore algorithms and design tools that together enable interactive video experiences. Our findings concerning processing videos and interacting with filmed content come together in this thesis' second major contribution. We present a new medium of expression where video elements can be looped, merged and triggered interactively. Static-camera videos are converted into loopable sequences that can be controlled in real time in response to simple end-user requests. We present novel algorithms and interactive tools that enable our new medium of expression. Our human-in-the-loop system gives the user progressively more creative control over the video content as they invest more effort and artists help us evaluate it. Monocular, static-camera videos are a good fit for looping algorithms but they have been limited to two-dimensional applications as pixels are reshuffled in space and time on the image plane. The final contribution of this thesis breaks through this barrier by allowing users to interact with filmed objects in a three-dimensional manner. Our novel object tracking algorithm extends existing 2D bounding box trackers with 3D information, such as a well-fitting bounding volume, which in turn enables a new breed of interactive video experiences. The filmed content becomes a three-dimensional playground as users are free to move the virtual camera or the tracked objects and see them from novel viewpoints.
APA, Harvard, Vancouver, ISO, and other styles
29

Li, Dong. "Thermal image analysis using calibrated video imaging." Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/4455.

Full text
Abstract:
Thesis (Ph.D.)--University of Missouri-Columbia, 2006.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on April 23, 2009) Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
30

Savadatti-Kamath, Sanmati S. "Video analysis and compression for surveillance applications." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26602.

Full text
Abstract:
Thesis (Ph.D)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Dr. J. R. Jackson; Committee Member: Dr. D. Scott; Committee Member: Dr. D. V. Anderson; Committee Member: Dr. P. Vela; Committee Member: Dr. R. Mersereau. Part of the SMARTech Electronic Thesis and Dissertation Collection.
APA, Harvard, Vancouver, ISO, and other styles
31

Kim, Changick. "A framework for object-based video analysis /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/5823.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Eastwood, Brian S. Taylor Russell M. "Multiple layer image analysis for video microscopy." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2009. http://dc.lib.unc.edu/u?/etd,2813.

Full text
Abstract:
Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2009.
Title from electronic title page (viewed Mar. 10, 2010). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science." Discipline: Computer Science; Department/School: Computer Science.
APA, Harvard, Vancouver, ISO, and other styles
33

Dye, Brigham R. "Reliability of pre-service teachers' coding of teaching videos using a video-analysis tool /." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2020.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Zheng, Hao. "Analysis of H.264-based Vclan implementation /." free to MU campus, to others for purchase, 2004. http://wwwlib.umi.com/cr/mo/fullcit?p1422980.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Deshpande, Milind Umesh. "Optimal video sensing strategy and performance analysis for wireless video sensors under delay constraints." Diss., Columbia, Mo. : University of Missouri-Columbia, 2005. http://hdl.handle.net/10355/5836.

Full text
Abstract:
Thesis (M.S.)--University of Missouri-Columbia, 2005.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (July 17, 2006) Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
36

Chengegowda, Venkatesh. "Analysis of Queues for Interactive Voice and Video Response Systems : Two Party Video Calls." Thesis, KTH, Kommunikationssystem, CoS, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-102451.

Full text
Abstract:
Video conversation on mobile devices is popularizing with the advent of 3G. The enhanced network capacity thus available enables transmission of video data over the internet. It has been forecasted by several VOIP service organizations that the present IVR systems will evolve into Voice and Video Response (IVVR) Systems. However, this evolution has many technical challenges on the way. Architectures to implement queuing systems for video data and standards for inter conversion of video data between the formats supported by calling parties are two of these challenges. This thesis is an analysis of queues and media transcoding for IVVRs. A major effort in this work involves constructing a prototype IVVR queuing system. The system is constructed by using an open source server named Asterisk and MySql database. Asterisk is a SIP based Public Exchange Server (PBX) and also a development environment for VOIP based IVRs. Functional scenarios for SIP session establishment and the corresponding session setup times for this queueing model are measured. The results indicate that the prototype serves as a sufficient model for a queue, although a significant delay is introduced for session establishment.  The work also includes analysis of integrating DiaStar™, is a SIP based media transcoding engine to this queue. However, this system is not complete to function with DiaStar for media translation. The study concludes with a mention of the areas for future work on this particular system and the general state of IVVR queuing systems in the industry.
Videosamtal på mobila enheter är popularisera med tillkomsten av 3G. Den förbättrade nätkapacitet så tillgänglig möjliggör överföring av videodata över Internet. Det har prognos av flera VOIP serviceorganisationer att de nuvarande IVR-system kommer att utvecklas till röst och video Response (IVVR) System. Dock har denna utveckling många tekniska utmaningar på vägen. Arkitekturer för att genomföra kösystem för videodata och standarder för bland konvertering av videodata mellan format som stöds för uppringande är två av dessa utmaningar. Denna avhandling är en analys av köer och media kodkonvertering för IVVRs. En stor insats i detta arbete innebär att bygga en prototyp IVVR kösystem. Systemet är konstruerat med hjälp av en öppen källkod-server som heter Asterisk och MySQL-databas. Asterisk är en SIP-baserad Public Exchange Server (PBX) och även en utvecklingsmiljö för VOIP-baserade IVRs. Funktionella scenarier för SIP session etablering och motsvarande sessionen inställningar för den föreslagna kö modell mäts. Resultaten indikerar att prototypen tjänar som en tillräcklig modell för en kö, även om en betydande fördröjning införs för sessionsupprättandebegäran. Arbetet omfattar även analys av integrering DiaStar™ är en SIP-baserad media kodkonvertering motor till denna kö. Emellertid är detta system inte helt att fungera med DiaStar för media translation. The studie avslutas med ett omnämnande av de områden för framtida arbete med detta system och det allmänna tillståndet i IVVR kö-system i branschen.
APA, Harvard, Vancouver, ISO, and other styles
37

Lee, Sangkeun. "Video analysis and abstraction in the compressed domain." Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04072004-180041/unrestricted/lee%5fsangkeun%5f200312%5fphd.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Emmot, Sebastian. "Characterizing Video Compression Using Convolutional Neural Networks." Thesis, Luleå tekniska universitet, Datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-79430.

Full text
Abstract:
Can compression parameters used in video encoding be estimated, given only the visual information of the resulting compressed video? If so, these parameters could potentially improve existing parametric video quality estimation models. Today, parametric models use information like bitrate to estimate the quality of a given video. This method is inaccurate since it does not consider the coding complexity of a video. The constant rate factor (CRF) parameter for h.264 encoding aims to keep the quality constant while varying the bitrate, if the CRF for a video is known together with bitrate, a better quality estimate could potentially be achieved. In recent years, artificial neural networks and specifically convolutional neural networks have shown great promise in the field of image processing. In this thesis, convolutional neural networks are investigated as a way of estimating the constant rate factor parameter for a degraded video by identifying the compression artifacts and their relation to the CRF used. With the use of ResNet, a model for estimating the CRF for each frame of a video can be derived, these per-frame predictions are further used in a video classification model which performs a total CRF prediction for a given video. The results show that it is possible to find a relation between the visual encoding artifacts and CRF used. The top-5 accuracy achieved for the model is at 61.9% with the use of limited training data. Given that today’s parametric bitrate based models for quality have no information about coding complexity, even a rough estimate of the CRF could improve the precision of them.
APA, Harvard, Vancouver, ISO, and other styles
39

Florez, Omar Ulises. "Knowledge Extraction in Video Through the Interaction Analysis of Activities Knowledge Extraction in Video Through the Interaction Analysis of Activities." DigitalCommons@USU, 2013. https://digitalcommons.usu.edu/etd/1720.

Full text
Abstract:
Video is a massive amount of data that contains complex interactions between moving objects. The extraction of knowledge from this type of information creates a demand for video analytics systems that uncover statistical relationships between activities and learn the correspondence between content and labels. However, those are open research problems that have high complexity when multiple actors simultaneously perform activities, videos contain noise, and streaming scenarios are considered. The techniques introduced in this dissertation provide a basis for analyzing video. The primary contributions of this research consist of providing new algorithms for the efficient search of activities in video, scene understanding based on interactions between activities, and the predicting of labels for new scenes.
APA, Harvard, Vancouver, ISO, and other styles
40

Wright, Geoffrey Albert. "How Does Video Analysis Impact Teacher Reflection-for-Action?" BYU ScholarsArchive, 2008. https://scholarsarchive.byu.edu/etd/1362.

Full text
Abstract:
Reflective practice is an integral component of a teacher's classroom success (Zeichner, 1996; Valli, 1997). Reflective practice requires a teacher to step back and consider the implications and effects of teaching practices. Research has shown that formal reflection on teaching can lead to improved understanding and practice of pedagogy, classroom management, and professionalism (Grossman, 2003). Several methods have been used over the years to stimulate reflective practice; many of these methods required teachers to use awkward and time-consuming tools with a minimal impact on teaching performance (Rodgers, 2002). This current study analyzes an innovative video-enhanced reflection process focused on improving teacher reflection. Video-enhanced reflection is a process that uses video analysis to stimulate reflective thought. The primary question of this study is "How does video analysis used in the context of an improved reflection technique impact teacher reflection-for-action?" The subjects of the study included five untenured teachers and one principal from an elementary school in a middle class residential area. A comparative case study approach was used to study the influence the video enhanced reflection model has on teacher reflection practices. The research method involved comparing typical teacher reflective practices with their experience using the video-enhanced reflective process. A series of vignettes and thematic analysis discussions were used to disaggregate, discuss, and present the data and findings. The findings from this study suggest the video-enhanced reflection process provides solutions to the barriers (i.e., time, tool, support) that have traditionally prevented reflection from being meaningful and long lasting. The qualitative analysis of teacher responses to the exit survey, interview findings, and comparison of the baseline and intervention methods suggests that the video-enhanced reflection process had a positive impact on teacher reflective abilities because it helped them more vividly describe, analyze, and critique their teaching.
APA, Harvard, Vancouver, ISO, and other styles
41

Nordeng, Eirik Tørud. "Video metric measurements in an FPGA for use in objective no-reference video quality analysis." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for elektronikk og telekommunikasjon, 2013. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-22706.

Full text
Abstract:
This thesis presents a way of performing objective video quality analyses in order to point out faults in the hardware of a video system that uses analogue video transmission technologies. The approach focuses on performing simple digital processing and analyses of the video data coherently using an FPGA. Several metrics that correlates with specific distortions are developed. These metrics give good indications of the state of the video system components. The algorithms are tested using MATLAB and mapped to an FPGA. The key components are implemented and verified in VHDL, and synthesized for an Altera Cyclone II FPGA. The thesis concludes that the proposed system has the ability to discover board-level faults in a video system that utilizes an FPGA and analogue video transmission. The system also has the ability to supplement external quality assessment systems in most cases, and function as a good alternative in cases where a quick and simple assessment of a video system is desired.
APA, Harvard, Vancouver, ISO, and other styles
42

Kong, Lingchao. "Modeling of Video Quality for Automatic Video Analysis and Its Applications in Wireless Camera Networks." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563295836742645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Guler, Puren. "Automated Crowd Behavior Analysis For Video Surveillance Applications." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614659/index.pdf.

Full text
Abstract:
Automated analysis of a crowd behavior using surveillance videos is an important issue for public security, as it allows detection of dangerous crowds and where they are headed. Computer vision based crowd analysis algorithms can be divided into three groups
people counting, people tracking and crowd behavior analysis. In this thesis, the behavior understanding will be used for crowd behavior analysis. In the literature, there are two types of approaches for behavior understanding problem: analyzing behaviors of individuals in a crowd (object based) and using this knowledge to make deductions regarding the crowd behavior and analyzing the crowd as a whole (holistic based). In this work, a holistic approach is used to develop a real-time abnormality detection in crowds using scale invariant feature transform (SIFT) based features and unsupervised machine learning techniques.
APA, Harvard, Vancouver, ISO, and other styles
44

Eriksson, Martin. "Video based analysis and visualization of human action." Doctoral thesis, KTH, Numerisk Analys och Datalogi, NADA, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-106.

Full text
Abstract:
Analyzing human motion is important in a number of ways. An athlete constantly needs to evaluate minute details about his or her motion pattern. In physical rehabilitation, the doctor needs to evaluate how well a patient is rehabilitating from injuries. Some systems are being developed in order to identify people only based on their gait. Automatic interpretation of sign language is another area that has received much attention. While all these applications can be considered useful in some sense, the analysis of human motion can also be used for pure entertainment. For example, by filming a sport activity from one view, it is possible to create a 3D reconstruction of this motion, that can be rendered from a view where no camera was originally placed. Such a reconstruction system can be enjoyable for the TV audience. It can also be useful for the computer-game industry. This thesis presents ideas and new methods on how such reconstructions can be obtained. One of the main purposes of this thesis is to identify a number of qualitative constraints that strongly characterizes a certain class of motion. These qualitative constraints provide enough information about the class so that every motion satisfying the constraints will "look nice" and appear, according to a human observer, to belong to the class. Further, the constraints must not be too restrictive; a large variation within the class is necessary. It is shown how such qualitative constraints can be learned automatically from a small set of examples. Another topic that will be addressed concerns analysis of motion in terms of quality assessment as well as classification. It is shown that in many cases, 2D projections of a motion carries almost as much information about the motion as the original 3D representation. It is also shown that single-view reconstruction of 2D data for the purpose of analysis is generally not useful. Using these facts, a prototype of a "virtual coach" that is able to track and analyze image data of human action is developed. Potentials and limitations of such a system are discussed in the the thesis.
QC 20100601
APA, Harvard, Vancouver, ISO, and other styles
45

Eriksson, Martin. "Video based analysis and visualization of human action /." Stockholm, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-106.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Nikfetrat, Nima. "Video-based Fire Analysis and Animation Using Eigenfires." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23471.

Full text
Abstract:
We introduce new approaches of modeling and synthesizing realistic-looking 2D fire animations using video-based techniques and statistical analysis. Our approaches are based on real footage of various small-scale fire samples with customized motions that we captured for this research, and the final results can be utilized as a sequence of images in video games, motion graphics and cinematic visual effects. Instead of conventional physically-based simulation, we utilize example-based principal component analysis (PCA) and take it to a new level by introducing “Eigenfires”, as a new way to represent the main features of various real fire samples. The visualization of Eigenfires helps animators to design the fire interactively through a more meaningful and convenient way in comparison to known procedural approaches or other video-based synthesis models. Our system enables artists to control real-life fire videos through motion transitions and loops by selecting any desired ranges of any video clips and then the system takes care of the remaining part that best represent a smooth transition. Instead of tricking the eyes with a basic blending only between similar shapes, our flexible fire transitions are capable of connecting various fire styles. Our techniques are also effective for data compressions, they can deliver real-time interactive recognition for high resolution images, very easy to implement, and requires little parameter tuning.
APA, Harvard, Vancouver, ISO, and other styles
47

Forsthoefel, Dana. "Leap segmentation in mobile image and video analysis." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50285.

Full text
Abstract:
As demand for real-time image processing increases, the need to improve the efficiency of image processing systems is growing. The process of image segmentation is often used in preprocessing stages of computer vision systems to reduce image data and increase processing efficiency. This dissertation introduces a novel image segmentation approach known as leap segmentation, which applies a flexible definition of adjacency to allow groupings of pixels into segments which need not be spatially contiguous and thus can more accurately correspond to large surfaces in the scene. Experiments show that leap segmentation correctly preserves an average of 20% more original scene pixels than traditional approaches, while using the same number of segments, and significantly improves execution performance (executing 10x - 15x faster than leading approaches). Further, leap segmentation is shown to improve the efficiency of a high-level vision application for scene layout analysis within 3D scene reconstruction. The benefits of applying image segmentation in preprocessing are not limited to single-frame image processing. Segmentation is also often applied in the preprocessing stages of video analysis applications. In the second contribution of this dissertation, the fast, single-frame leap segmentation approach is extended into the temporal domain to develop a highly-efficient method for multiple-frame segmentation, called video leap segmentation. This approach is evaluated for use on mobile platforms where processing speed is critical using moving-camera traffic sequences captured on busy, multi-lane highways. Video leap segmentation accurately tracks segments across temporal bounds, maintaining temporal coherence between the input sequence frames. It is shown that video leap segmentation can be applied with high accuracy to the task of salient segment transformation detection for alerting drivers to important scene changes that may affect future steering decisions. Finally, while research efforts in the field of image segmentation have often recognized the need for efficient implementations for real-time processing, many of today’s leading image segmentation approaches exhibit processing times which exceed their camera frame periods, making them infeasible for use in real-time applications. The third research contribution of this dissertation focuses on developing fast implementations of the single-frame leap segmentation approach for use on both single-core and multi-core platforms as well as on both high-performance and resource-constrained systems. While the design of leap segmentation lends itself to efficient implementations, the efficiency achieved by this algorithm, as in any algorithm, is can be improved with careful implementation optimizations. The leap segmentation approach is analyzed in detail and highly optimized implementations of the approach are presented with in-depth studies, ranging from storage considerations to realizing parallel processing potential. The final implementations of leap segmentation for both serial and parallel platforms are shown to achieve real-time frame rates even when processing very high resolution input images. Leap segmentation’s accuracy and speed make it a highly competitive alternative to today’s leading segmentation approaches for modern, real-time computer vision systems.
APA, Harvard, Vancouver, ISO, and other styles
48

Faircloth, Ryan. "AUDIO AND VIDEO TEMPO ANALYSIS FOR DANCE DETECTION." Master's thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2633.

Full text
Abstract:
The amount of multimedia in existence has become so extensive that the organization of this data cannot be performed manually. Systems designed to maintain such quantity need superior methods of understanding the information contained in the data. Aspects of Computer Vision deal with such problems for the understanding of image and video content. Additionally large ontologies such as LSCOM are collections of feasible high-level concepts that are of interest to identify within multimedia content. While ontologies often include the activity of dance it has had virtually no coverage in Computer Vision literature in terms of actual detection. We will demonstrate the fact that training based approaches are challenged by dance because the activity is defined by an unlimited set of movements and therefore unreasonable amounts of training data would be required to recognize even a small portion of the immense possibilities for dance. In this thesis we present a non-training, tempo based approach to dance detection which yields very good results when compared to another method with state-of-the-art performance for other common activities; the testing dataset contains videos acquired mostly through YouTube. The algorithm is based on one dimensional analysis in which we perform visual beat detection through the computation of optical flow. Next we obtain a set of tempo hypotheses and the final stage of our method tracks visual beats through a video sequence in order to determine the most likely tempo for the object motion. In this thesis we will not only demonstrate the utility for visual beats in visual tempo detection but we will demonstrate their existence in most of the common activities considered by state-of-the-art methods.
M.S.E.E.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering MSEE
APA, Harvard, Vancouver, ISO, and other styles
49

Isgro, Francesco. "Geometric methods for video sequence analysis and applications." Thesis, Heriot-Watt University, 2001. http://hdl.handle.net/10399/495.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Fletcher, M. J. "A modular system for video based motion analysis." Thesis, University of Reading, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.293144.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography