Dissertations / Theses: 'Video retrieval'

1

Pickering, Marcus Jerome. "Video retrieval and summarisation." Thesis, Imperial College London, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.411790.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chen, Juan. "Content-based Digital Video Processing. Digital Videos Segmentation, Retrieval and Interpretation." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4256.

Full text

Abstract:

Recent research approaches in semantics based video content analysis require shot boundary detection as the first step to divide video sequences into sections. Furthermore, with the advances in networking and computing capability, efficient retrieval of multimedia data has become an important issue. Content-based retrieval technologies have been widely implemented to protect intellectual property rights (IPR). In addition, automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this thesis, a paradigm is proposed to segment, retrieve and interpret digital videos. Five algorithms are presented to solve the video segmentation task. Firstly, a simple shot cut detection algorithm is designed for real-time implementation. Secondly, a systematic method is proposed for shot detection using content-based rules and FSM (finite state machine). Thirdly, the shot detection is implemented using local and global indicators. Fourthly, a context awareness approach is proposed to detect shot boundaries. Fifthly, a fuzzy logic method is implemented for shot detection. Furthermore, a novel analysis approach is presented for the detection of video copies. It is robust to complicated distortions and capable of locating the copy of segments inside original videos. Then, iv objects and events are extracted from MPEG Sequences for Video Highlights Indexing and Retrieval. Finally, a human fighting detection algorithm is proposed for movie annotation.

APA, Harvard, Vancouver, ISO, and other styles

3

Faichney, Jolon. "Content-Based Retrieval of Digital Video." Thesis, Griffith University, 2005. http://hdl.handle.net/10072/365697.

Full text

Abstract:

In the next few years consumers will have access to large amounts of video and image data either created by themselves with digital video and still cameras or by having access to other image and video content electronically. Existing personal computer hardware and software has not been designed to manage large quantities of multimedia content. As a result, research in the area of content-based video retrieval (CBVR) has been underway for the last fifteen years. This research aims to improve CBVR by providing an accurate and reliable shape-colour representation and by providing a new 3D user interface called DomeWorld for the efficient browsing of large video databases. Existing feature extraction techniques designed for use in large databases are typically simple techniques as they must conform to the limited processing and storage constraints that are exhibited by large scale databases. Conversely, more complex feature extraction techniques provide higher level descriptions of the underlying data but are time consuming and require large amounts of storage making them less useful for large databases. In this thesis a technique for medium to high level shape representation is presented that exhibits efficient storage and query performance. The technique uses a very accurate contour detection system that incorporates a new asymmetry edge detector which is shown to perform better than other contour detection techniques combined with a new summarisation technique to efficiently store contours. In addition, contours are represented by histograms further reducing space requirements and increasing query performance. A new type of histogram is introduced called the fuzzy histogram and is applied to content-based retrieval systems for the first time. Fuzzy histograms improve the ranking of query results over non-fuzzy techniques especially in low bin-count histogram configurations. The fuzzy contour histogram approach is compared with an exhaustive contour comparison technique and is found to provide equivalent or better results. A number of colour distribution representation techniques were investigated for integration with the contour histogram and the fuzzy HSV histogram was found to provide the best performance. When the colour and contour histograms were integrated less overall bins were required as each histogram compensates for the other’s weaknesses. The result is that only a quarter of the bins were required than either colour or contour histogram alone further reducing query times and storage requirements. This research also improves the user experience with a new user interface called DomeWorld that uses three-dimensional translucent domes. Existing user interfaces are either designed for image databases, for browsing videos, or for browsing large non-multimedia data sets. DomeWorld is designed to be able to browse both image and video databases through a number of innovative techniques including hierarchical clustering, radial space-filling layout of nodes, three-dimensional presentation, and translucent domes that allow the hierarchical nature of the data to be viewed whilst also seeing the relationship between child nodes a number of levels deep. A taxonomy of existing image, video, and large data set user interfaces is presented and the proposed user interface is evaluated within the framework. It is found that video database user interfaces have four requirements: context and detail, gisting, clustering, and integration of video and images. None of the 27 evaluated user interfaces satisfy all four requirements. The DomeWorld user interface is designed to satisfy all of the requirements and presents a step forward in CBVR user interaction. This thesis investigates two important areas of CBVR, structural indexing and user interaction, and presents techniques which advance the field. These two areas will become very important in the future when users must access and manage large collections of image and video content.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information Technology
Science, Environment, Engineering and Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

4

Banda, Nagamani. "Adaptive video segmentation." Morgantown, W. Va. : [West Virginia University Libraries], 2004. https://etd.wvu.edu/etd/controller.jsp?moduleName=documentdata&jsp%5FetdId=3520.

Full text

Abstract:

Thesis (M.S.)--West Virginia University, 2004.
Title from document title page. Document formatted into pages; contains vi, 52 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 50-52).

APA, Harvard, Vancouver, ISO, and other styles

5

Vrochidis, Stefanos. "Interactive video retrieval using implicit user feedback." Thesis, Queen Mary, University of London, 2013. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8729.

Full text

Abstract:

In the recent years, the rapid development of digital technologies and the low cost of recording media have led to a great increase in the availability of multimedia content worldwide. This availability places the demand for the development of advanced search engines. Traditionally, manual annotation of video was one of the usual practices to support retrieval. However, the vast amounts of multimedia content make such practices very expensive in terms of human effort. At the same time, the availability of low cost wearable sensors delivers a plethora of user-machine interaction data. Therefore, there is an important challenge of exploiting implicit user feedback (such as user navigation patterns and eye movements) during interactive multimedia retrieval sessions with a view to improving video search engines. In this thesis, we focus on automatically annotating video content by exploiting aggregated implicit feedback of past users expressed as click-through data and gaze movements. Towards this goal, we have conducted interactive video retrieval experiments, in order to collect click-through and eye movement data in not strictly controlled environments. First, we generate semantic relations between the multimedia items by proposing a graph representation of aggregated past interaction data and exploit them to generate recommendations, as well as to improve content-based search. Then, we investigate the role of user gaze movements in interactive video retrieval and propose a methodology for inferring user interest by employing support vector machines and gaze movement-based features. Finally, we propose an automatic video annotation framework, which combines query clustering into topics by constructing gaze movement-driven random forests and temporally enhanced dominant sets, as well as video shot classification for predicting the relevance of viewed items with respect to a topic. The results show that exploiting heterogeneous implicit feedback from past users is of added value for future users of interactive video retrieval systems.

APA, Harvard, Vancouver, ISO, and other styles

6

Zhang, Lelin. "Scalable Content-Based Image and Video Retrieval." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/15439.

Full text

Abstract:

The popularity of the Internet and portable image capturing devices brings in unprecedented amount of images and videos. Content-based visual search provides an important tool for users to consume the ever-growing digital media repositories, and is becoming an increasingly demanding task as never before. In this thesis, we focus on improving the scalability, efficiency and usability of content-based image and video retrieval systems, particularly in dynamic and open environments. Towards our goal, we make four contributions to the research community. First, we propose a scalable approach to adopt bag-of-visual-words (BoVW) to content-based image retrieval (CBIR) in peer-to-peer (P2P) networks. To overcome the dynamic P2P environment, we propose a distributed codebook updating algorithm based on splitting/merging of individual codewords, which maintains the workload balance in the network churn. Our approach offers a scalable framework for content-based visual search in P2P environment. Second, we improve the retrieval performance of CBIR with relevance feedback (RF). We formulate the RF process as an energy minimization problem, and utilize graph cuts algorithm to solve the problem and obtain relevant/irrelevant labels for the images. Our method enables flexible partitioning of the feature space and is capable of handling challenging scenarios. Third, we improve the retrieval performance of trajectory based action video retrieval with spatial-temporal context. We exploit the spatial-temporal correlations among trajectories for descriptor coding, and tackle the trajectory segment mis-alignment issue with an offset-aware distance for trajectory matching. Finally, we develop a toolset to improve the efficiency and provide better insight of the BoVW pipeline. Our toolset provides robust integration of different methods, automatic parallel execution and result reusing, and visualization of the retrieval process.

APA, Harvard, Vancouver, ISO, and other styles

7

Aytar, Yusuf. "SEMANTIC VIDEO RETRIEVAL USING HIGH LEVEL CONTEXT." Master's thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3455.

Full text

Abstract:

Video retrieval – searching and retrieving videos relevant to a user defined query – is one of the most popular topics in both real life applications and multimedia research. This thesis employs concepts from Natural Language Understanding in solving the video retrieval problem. Our main contribution is the utilization of the semantic word similarity measures for video retrieval through the trained concept detectors, and the visual co-occurrence relations between such concepts. We propose two methods for content-based retrieval of videos: (1) A method for retrieving a new concept (a concept which is not known to the system and no annotation is available) using semantic word similarity and visual co-occurrence, which is an unsupervised method. (2) A method for retrieval of videos based on their relevance to a user defined text query using the semantic word similarity and visual content of videos. For evaluation purposes, we mainly used the automatic search and the high level feature extraction test set of TRECVID'06 and TRECVID'07 benchmarks. These two data sets consist of 250 hours of multilingual news video captured from American, Arabic, German and Chinese TV channels. Although our method for retrieving a new concept is an unsupervised method, it outperforms the trained concept detectors (which are supervised) on 7 out of 20 test concepts, and overall it performs very close to the trained detectors. On the other hand, our visual content based semantic retrieval method performs more than 100% better than the text-based retrieval method. This shows that using visual content alone we can have significantly good retrieval results.
M.S.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science MS

APA, Harvard, Vancouver, ISO, and other styles

8

Volkmer, Timo, and timovolkmer@gmx net. "Semantics of Video Shots for Content-based Retrieval." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20090220.122213.

Full text

Abstract:

Content-based video retrieval research combines expertise from many different areas, such as signal processing, machine learning, pattern recognition, and computer vision. As video extends into both the spatial and the temporal domain, we require techniques for the temporal decomposition of footage so that specific content can be accessed. This content may then be semantically classified - ideally in an automated process - to enable filtering, browsing, and searching. An important aspect that must be considered is that pictorial representation of information may be interpreted differently by individual users because it is less specific than its textual representation. In this thesis, we address several fundamental issues of content-based video retrieval for effective handling of digital footage. Temporal segmentation, the common first step in handling digital video, is the decomposition of video streams into smaller, semantically coherent entities. This is usually performed by detecting the transitions that separate single camera takes. While abrupt transitions - cuts - can be detected relatively well with existing techniques, effective detection of gradual transitions remains difficult. We present our approach to temporal video segmentation, proposing a novel algorithm that evaluates sets of frames using a relatively simple histogram feature. Our technique has been shown to range among the best existing shot segmentation algorithms in large-scale evaluations. The next step is semantic classification of each video segment to generate an index for content-based retrieval in video databases. Machine learning techniques can be applied effectively to classify video content. However, these techniques require manually classified examples for training before automatic classification of unseen content can be carried out. Manually classifying training examples is not trivial because of the implied ambiguity of visual content. We propose an unsupervised learning approach based on latent class modelling in which we obtain multiple judgements per video shot and model the users' response behaviour over a large collection of shots. This technique yields a more generic classification of the visual content. Moreover, it enables the quality assessment of the classification, and maximises the number of training examples by resolving disagreement. We apply this approach to data from a large-scale, collaborative annotation effort and present ways to improve the effectiveness for manual annotation of visual content by better design and specification of the process. Automatic speech recognition techniques along with semantic classification of video content can be used to implement video search using textual queries. This requires the application of text search techniques to video and the combination of different information sources. We explore several text-based query expansion techniques for speech-based video retrieval, and propose a fusion method to improve overall effectiveness. To combine both text and visual search approaches, we explore a fusion technique that combines spoken information and visual information using semantic keywords automatically assigned to the footage based on the visual content. The techniques that we propose help to facilitate effective content-based video retrieval and highlight the importance of considering different user interpretations of visual content. This allows better understanding of video content and a more holistic approach to multimedia retrieval in the future.

APA, Harvard, Vancouver, ISO, and other styles

9

Fernández, Beltrán Rubén. "Characterisation and adaptive learning in interactive video retrieval." Doctoral thesis, Universitat Jaume I, 2016. http://hdl.handle.net/10803/387220.

Full text

Abstract:

El objetivo principal de esta tesis consiste en utilizar eficazmente los modelos de tópicos latentes para afrontar el problema de la recuperación automática de vídeo. Concretamente, se pretende mejorar tanto a nivel de eficiencia como a nivel de precisión el actual estado del arte en materia de los sitemas de recuperación automática de vídeo. En general, los modelos de tópicos latentes son un conjunto de herramientas estadísticas que permiten extraer los patrones generadores de una colección de datos. Tradicionalmente, este tipo de técnicas no han sido consideradas de gran utilidad para los sistemas de recuperación automática de vídeo debido a su alto coste computacional y a la propia complejidad del espacio de tópicos en el ámbito de la información visual.
In this work, we are interested in the use of latent topics to overcome the current limitations in CBVR. Despite the potential of topic models to uncover the hidden structure of a collection, they have traditionally been unable to provide a competitive advantage in CBVR because of the high computational cost of their algorithms and the complexity of the latent space in the visual domain. Throughout this thesis we focus on designing new models and tools based on topic models to take advantage of the latent space in CBVR. Specifically, we have worked in four different areas within the retrieval process: vocabulary reduction, encoding, modelling and ranking, being our most important contributions related to both modelling and ranking.

APA, Harvard, Vancouver, ISO, and other styles

10

Demirdizen, Goncagul. "An Ontology-driven Video Annotation And Retrieval System." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12612592/index.pdf.

Full text

Abstract:

In this thesis, a system, called Ontology-Driven Video Annotation and Retrieval System (OntoVARS) is developed in order to provide a video management system which is used for ontology-driven semantic content annotation and querying. The proposed system is based on MPEG-7 ontology which provides interoperability and common communication platform with other MPEG-7 ontology compatible systems. The Rhizomik MPEG-7 ontology is used as the core ontology and domain specific ontologies are integrated to the core ontology in order to provide ontology-based video content annotation and querying capabilities to the user. The proposed system supports content-based annotation and spatio-temporal data modeling in video databases by using the domain ontology concepts. Moreover, the system enables ontology-driven query formulation and processing according to the domain ontology instances and concepts. In the developed system, ontology-driven concept querying, spatio-temporal querying, region-based and time-based querying capabilities are performed as simple querying types. Besides these simple query types, compound queries are also generated by combining simple queries with "
("
, "
)"
, "
AND"
and "
OR"
operators. For all these query types, the system supports both general and video specific query processing. By this means, the user is able to pose queries on all videos in the video databases as well as the details of a specific video of interest.

APA, Harvard, Vancouver, ISO, and other styles

11

Mohanna, Farahnaz. "Content based video database retrieval using shape features." Thesis, University of Surrey, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.250764.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Thomas, Naveen Moham. "Motion based video object detection for event retrieval." Thesis, University of Bristol, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.441380.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Markatopoulou, Foteini. "Machine learning architectures for video annotation and retrieval." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/44693.

Full text

Abstract:

In this thesis we are designing machine learning methodologies for solving the problem of video annotation and retrieval using either pre-defined semantic concepts or ad-hoc queries. Concept-based video annotation refers to the annotation of video fragments with one or more semantic concepts (e.g. hand, sky, running), chosen from a predefined concept list. Ad-hoc queries refer to textual descriptions that may contain objects, activities, locations etc., and combinations of the former. Our contributions are: i) A thorough analysis on extending and using different local descriptors towards improved concept-based video annotation and a stacking architecture that uses in the first layer, concept classifiers trained on local descriptors and improves their prediction accuracy by implicitly capturing concept relations, in the last layer of the stack. ii) A cascade architecture that orders and combines many classifiers, trained on different visual descriptors, for the same concept. iii) A deep learning architecture that exploits concept relations at two different levels. At the first level, we build on ideas from multi-task learning, and propose an approach to learn concept-specific representations that are sparse, linear combinations of representations of latent concepts. At a second level, we build on ideas from structured output learning, and propose the introduction, at training time, of a new cost term that explicitly models the correlations between the concepts. By doing so, we explicitly model the structure in the output space (i.e., the concept labels). iv) A fully-automatic ad-hoc video search architecture that combines concept-based video annotation and textual query analysis, and transforms concept-based keyframe and query representations into a common semantic embedding space. Our architectures have been extensively evaluated on the TRECVID SIN 2013, the TRECVID AVS 2016, and other large-scale datasets presenting their effectiveness compared to other similar approaches.

APA, Harvard, Vancouver, ISO, and other styles

14

Davis, Marc Eliot. "Media streams--representing video for retrieval and repurposing." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/29088.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Kolonias, I. "Cognitive vision systems for video understanding and retrieval." Thesis, University of Surrey, 2007. http://epubs.surrey.ac.uk/843661/.

Full text

Abstract:

This thesis addresses the problem of creating computer vision systems that will facilitate high-level, user-friendly interpretation of an observed scene, and which will be readily adaptable to a wide range of computer vision tasks. Hence, the notion of injecting cognitive capabilities to traditional computer vision systems is central to this work. Initially, the requirements of creating a cognitive vision system will be examined. This will lead us to the conclusion that the two main enabling components for such systems are the following: a unified framework for reasoning in the context of the observed scene; and a multi-layered memory architecture that will aid the reasoning framework in recalling and storing all relevant information about the observed scene. Regarding the apparatus used for reasoning in video sequences, it will be argued that it must be characterised by its ability to be applied at all levels of information processing (from raw input data to high-level abstractions concerning the evolution of the observed scene), support and exploit any combination of spatial and temporal dependencies (i.e. context) present among the input data, and deliver good reasoning performance when applied at any categorical domain. On the other hand, the requirements the reasoning engine sets will be used as a guideline for the design of a memory architecture conducive to the former. Therefore, the latter must be able to handle arbitrary input data types, depending on the scope of the current cognition task. It must also allow for both forward and feedback interaction with the reasoning framework, as contextual information extracted from the observed scene at a later stage may assist the reasoning engine in altering a decision made in previous stages - just like humans do when presented with contradicting evidence. To further emulate the mechanisms that enable human cognition, forgetting processes were also embedded in the memory infrastructure. For this particular feature, different layers of memory storage facilitate forgetting at different speeds; the system forgets raw input and low-level feature data very quickly, whereas high-level concepts about the evolution of the observed scene are retained over relatively long term. Finally, the overall proposed system has been implemented and tested on a real-world application - the annotation of broadcast tennis video sequences. In this sample application, the goal was to create a cognitive vision system that would keep track of the score for the duration of the broadcast match, based on the main components described above. The results obtained from processing a set of sequences captured off-the-air indicate that the overall approach achieves far superior results to simply segmenting the video sequence into shots and analysing each one separately, taken out of the context of the match. This demonstrates that the ability to adapt by discovering and exploiting context is paramount to the efficiency of any future computer vision system, and is, in no small part, a feature that sets biological cognitive vision systems apart from their machine-based counterparts.

APA, Harvard, Vancouver, ISO, and other styles

16

Lin, Ming. "Automated Lecture Video Segmentation: Facilitate Content Browsing and Retrieval." Diss., The University of Arizona, 2006. http://hdl.handle.net/10150/193843.

Full text

Abstract:

People often have difficulties finding specific information in video because of its linear and unstructured nature. Segmenting long videos into small clips by topics and providing browsing and search functionalities is beneficial for information searching. However, manual segmentation is labor intensive and existing automated segmentation methods are not effective for plenty of amateur made and unedited lecture videos. The objectives of this dissertation are to develop 1) automated segmentation algorithms to extract the topic structure of a lecture video, and 2) retrieval algorithms to identify the relevant video segments for user queries.Based on an extensive literature review, existing segmentation features and approaches are summarized and research challenges and questions are presented. Manual segmentation studies are conducted to understand the content structure of a lecture video and a set of potential segmentation features and methods are extracted to facilitate the design of automated segmentation approaches. Two static algorithms are developed to segment a lecture video into a list of topics. Features from multimodalities and various knowledge sources (e.g. electronic slides) are used in the segmentation algorithms. A dynamic segmentation method is also developed to retrieve relevant video segments of appropriate sizes based on the questions asked by users. A series of evaluation studies are conducted and results are presented to demonstrate the effectiveness and usefulness of the automated segmentation approaches.

APA, Harvard, Vancouver, ISO, and other styles

17

Park, Dong-Jun. "Video event detection framework on large-scale video data." Diss., University of Iowa, 2011. https://ir.uiowa.edu/etd/2754.

Full text

Abstract:

Detection of events and actions in video entails substantial processing of very large, even open-ended, video streams. Video data presents a unique challenge for the information retrieval community because properly representing video events is challenging. We propose a novel approach to analyze temporal aspects of video data. We consider video data as a sequence of images that form a 3-dimensional spatiotemporal structure, and perform multiview orthographic projection to transform the video data into 2-dimensional representations. The projected views allow a unique way to rep- resent video events and capture the temporal aspect of video data. We extract local salient points from 2D projection views and perform detection-via-similarity approach on a wide range of events against real-world surveillance data. We demonstrate our example-based detection framework is competitive and robust. We also investigate the synthetic example driven retrieval as a basis for query-by-example.

APA, Harvard, Vancouver, ISO, and other styles

18

Akpinar, Samet. "Ontology Based Semantic Retrieval Of Video Contents Using Metadata." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12608772/index.pdf.

Full text

Abstract:

The aim of this thesis is the development of an infrastructure which is used for semantic retrieval of multimedia contents. Motivated by the needs of semantic search and retrieval of multimedia contents, operating directly on the MPEG-7 based annotations can be thought as a reasonable way for meeting these needs as MPEG-7 is a common standard providing a wide multimedia content description schema. However, it is clear that the MPEG-7 formalism is deficient about the semantics and reasoning support. From this perspective, additionally, we need to represent MPEG-7 descriptions in a new formalism in order to fill the gap about semantics and reasoning. Then, the semantic web and multimedia technologies intercept at this point of multimedia semantics. In this thesis, OWL Web Ontology Language, which is based on description logic has been utilized to model a connection between the ontology semantics and video metadata. Modeling the domain of the videos using ontologies and the MPEG-7 descriptions, and reasoning on the videos by the help of the logical formalism of these ontologies are the main objectives of the thesis.

APA, Harvard, Vancouver, ISO, and other styles

19

Wang, Lei. "Content based video retrieval via spatial-temporal information discovery." Thesis, Robert Gordon University, 2013. http://hdl.handle.net/10059/1119.

Full text

Abstract:

Content based video retrieval (CBVR) has been strongly motivated by a variety of realworld applications. Most state-of-the-art CBVR systems are built based on Bag-of-visual- Words (BovW) framework for visual resources representation and access. The framework, however, ignores spatial and temporal information contained in videos, which plays a fundamental role in unveiling semantic meanings. The information includes not only the spatial layout of visual content on a still frame (image), but also temporal changes across the sequential frames. Specially, spatially and temporally co-occurring visual words, which are extracted under the BovW framework, often tend to collaboratively represent objects, scenes, or events in the videos. The spatial and temporal information discovery would be useful to advance the CBVR technology. In this thesis, we propose to explore and analyse the spatial and temporal information from a new perspective: i) co-occurrence of the visual words is formulated as a correlation matrix, ii) spatial proximity and temporal coherence are analytically and empirically studied to re ne this correlation. Following this, a quantitative spatial and temporal correlation (STC) model is de ned. The STC discovered from either the query example (denoted by QC) or the data collection (denoted by DC) are assumed to determine speci- city of the visual words in the retrieval model, i:e: selected Words-Of-Interest are found more important for certain topics. Based on this hypothesis, we utilized the STC matrix to establish a novel visual content similarity measurement method and a query reformulation scheme for the retrieval model. Additionally, the STC also characterizes the context of the visual words, and accordingly a STC-Based context similarity measurement is proposed to detect the synonymous visual words. The method partially solves an inherent error of visual vocabulary under the BovW framework. Systematic experimental evaluations on public TRECVID and CC WEB VIDEO video collections demonstrate that the proposed methods based on the STC can substantially improve retrieval e ectiveness of the BovW framework. The retrieval model based on STC outperforms state-of-the-art CBVR methods on the data collections without storage and computational expense. Furthermore, the rebuilt visual vocabulary in this thesis is more compact and e ective. Above methods can be incorporated together for e ective and e cient CBVR system implementation. Based on the experimental results, it is concluded that the spatial-temporal correlation e ectively approximates the semantical correlation. This discovered correlation approximation can be utilized for both visual content representation and similarity measurement, which are key issues for CBVR technology development.

APA, Harvard, Vancouver, ISO, and other styles

20

Durak, Nurcan. "Semantic Video Modeling And Retrieval With Visual, Auditory, Textual Sources." Master's thesis, METU, 2004. http://etd.lib.metu.edu.tr/upload/12605438/index.pdf.

Full text

Abstract:

The studies on content-based video indexing and retrieval aim at accessing video content from different aspects more efficiently and effectively. Most of the studies have concentrated on the visual component of video content in modeling and retrieving the video content. Beside visual component, much valuable information is also carried in other media components, such as superimposed text, closed captions, audio, and speech that accompany the pictorial component. In this study, semantic content of video is modeled using visual, auditory, and textual components. In the visual domain, visual events, visual objects, and spatial characteristics of visual objects are extracted. In the auditory domain, auditory events and auditory objects are extracted. In textual domain, speech transcripts and visible texts are considered. With our proposed model, users can access video content from different aspects and get desired information more quickly. Beside multimodality, our model is constituted on semantic hierarchies that enable querying the video content at different semantic levels. There are sequence-scene hierarchies in visual domain, background-foreground hierarchies in auditory domain, and subject hierarchies in speech domain. Presented model has been implemented and multimodal content queries, hierarchical queries, fuzzy spatial queries, fuzzy regional queries, fuzzy spatio-temporal queries, and temporal queries have been applied on video content successfully.

APA, Harvard, Vancouver, ISO, and other styles

21

Ren, Jinchang. "Semantic content analysis for effective video segmentation, summarisation and retrieval." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4251.

Full text

Abstract:

This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications.

APA, Harvard, Vancouver, ISO, and other styles

22

Ren, Cathy Wei. "A novel approach to video retrieval using spatio-temporal information." Thesis, University of Exeter, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.425494.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Anjulan, Arasanathan. "Intelligent content based video retrieval based on local region tracks." Thesis, University of Bristol, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445827.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Roux, Matthew John. "ATOM : a distributed system for video retrieval via ATM networks." Master's thesis, University of Cape Town, 1999. http://hdl.handle.net/11427/21327.

Full text

Abstract:

The convergence of high speed networks, powerful personal computer processors and improved storage technology has led to the development of video-on-demand services to the desktop that provide interactive controls and deliver Client-selected video information on a Client-specified schedule. This dissertation presents the design of a video-on-demand system for Asynchronous Transfer Mode (ATM) networks, incorporating an optimised topology for the nodes in the system and an architecture for Quality of Service (QoS). The system is called ATOM which stands for Asynchronous Transfer Mode Objects. Real-time video playback over a network consumes large bandwidth and requires strict bounds on delay and error in order to satisfy the visual and auditory needs of the user. Streamed video is a fundamentally different type of traffic to conventional IP (Internet Protocol) data since files are viewed in real-time, not downloaded and then viewed. This streaming data must arrive at the Client decoder when needed or it loses its interactive value. Characteristics of multimedia data are investigated including the use of compression to reduce the excessive bit rates and storage requirements of digital video. The suitability of MPEG-1 for video-on-demand is presented. Having considered the bandwidth, delay and error requirements of real-time video, the next step in designing the system is to evaluate current models of video-on-demand. The distributed nature of four such models is considered, focusing on how Clients discover Servers and locate videos. This evaluation eliminates a centralized approach in which Servers have no logical or physical connection to any other Servers in the network and also introduces the concept of a selection strategy to find alternative Servers when Servers are fully loaded. During this investigation, it becomes clear that another entity (called a Broker) could provide a central repository for Server information. Clients have logical access to all videos on every Server simply by connecting to a Broker. The ATOM Model for distributed video-on-demand is then presented by way of a diagram of the topology showing the interconnection of Servers, Brokers and Clients; a description of each node in the system; a list of the connectivity rules; a description of the protocol; a description of the Server selection strategy and the protocol if a Broker fails. A sample network is provided with an example of video selection and design issues are raised and solved including how nodes discover each other, a justification for using a mesh topology for the Broker connections, how Connection Admission Control (CAC) is achieved, how customer billing is achieved and how information security is maintained. A calculation of the number of Servers and Brokers required to service a particular number of Clients is presented. The advantages of ATOM are described. The underlying distributed connectivity is abstracted away from the Client. Redundant Server/Broker connections are eliminated and the total number of connections in the system are minimized by the rule stating that Clients and Servers may only connect to one Broker at a time. This reduces the total number of Switched Virtual Circuits (SVCs) which are a performance hindrance in ATM. ATOM can be easily scaled by adding more Servers which increases the total system capacity in terms of storage and bandwidth. In order to transport video satisfactorily, a guaranteed end-to-end Quality of Service architecture must be in place. The design methodology for such an architecture is investigated starting with a review of current QoS architectures in the literature which highlights important definitions including a flow, a service contract and flow management. A flow is a single media source which traverses resource modules between Server and Client. The concept of a flow is important because it enables the identification of the areas requiring consideration when designing a QoS architecture. It is shown that ATOM adheres to the principles motivating the design of a QoS architecture, namely the Integration, Separation and Transparency principles. The issue of mapping human requirements to network QoS parameters is investigated and the action of a QoS framework is introduced, including several possible causes of QoS degradation. The design of the ATOM Quality of Service Architecture (AQOSA) is then presented. AQOSA consists of 11 modules which interact to provide end-to-end QoS guarantees for each stream. Several important results arise from the design. It is shown that intelligent choice of stored videos in respect of peak bandwidth can improve overall system capacity. The concept of disk striping over a disk array is introduced and a Data Placement Strategy is designed which eliminates disk hot spots (i.e. Overuse of some disks whilst others lie idle.) A novel parameter (the B-P Ratio) is presented which can be used by the Server to predict future bursts from each video stream. The use of Traffic Shaping to decrease the load on the network from each stream is presented. Having investigated four algorithms for rewind and fast-forward in the literature, a rewind and fast-forward algorithm is presented. The method produces a significant decrease in bandwidth, and the resultant stream is very constant, reducing the chance that the stream will add to network congestion. The C++ classes of the Server, Broker and Client are described emphasizing the interaction between classes. The use of ATOM in the Virtual Private Network and the multimedia teaching laboratory is considered. Conclusions and recommendations for future work are presented. It is concluded that digital video applications require high bandwidth, low error, low delay networks; a video-on-demand system to support large Client volumes must be distributed, not centralized; control and operation (transport) must be separated; the number of ATM Switched Virtual Circuits (SVCs) must be minimized; the increased connections caused by the Broker mesh is justified by the distributed information gain; a Quality of Service solution must address end-to-end issues. It is recommended that a web front-end for Brokers be developed; the system be tested in a wide area A TM network; the Broker protocol be tested by forcing failure of a Broker and that a proprietary file format for disk striping be implemented.

APA, Harvard, Vancouver, ISO, and other styles

25

Guan, Genliang. "Novel perspectives and approaches to video summarization." Thesis, The University of Sydney, 2014. http://hdl.handle.net/2123/13550.

Full text

Abstract:

The increasing volume of videos requires efficient and effective techniques to index and structure videos. Video summarization is such a technique that extracts the essential information from a video, so that tasks such as comprehension by users and video content analysis can be conducted more effectively and efficiently. The research presented in this thesis investigates three novel perspectives of the video summarization problem and provides approaches to such perspectives. Our first perspective is to employ local keypoint to perform keyframe selection. Two criteria, namely Coverage and Redundancy, are introduced to guide the keyframe selection process in order to identify those representing maximum video content and sharing minimum redundancy. To efficiently deal with long videos, a top-down strategy is proposed, which splits the summarization problem to two sub-problems: scene identification and scene summarization. Our second perspective is to formulate the task of video summarization to the problem of sparse dictionary reconstruction. Our method utilizes the true sparse constraint L0 norm, instead of the relaxed constraint L2,1 norm, such that keyframes are directly selected as a sparse dictionary that can reconstruct the video frames. In addition, a Percentage Of Reconstruction (POR) criterion is proposed to intuitively guide users in selecting an appropriate length of the summary. In addition, an L2,0 constrained sparse dictionary selection model is also proposed to further verify the effectiveness of sparse dictionary reconstruction for video summarization. Lastly, we further investigate the multi-modal perspective of multimedia content summarization and enrichment. There are abundant images and videos on the Web, so it is highly desirable to effectively organize such resources for textual content enrichment. With the support of web scale images, our proposed system, namely StoryImaging, is capable of enriching arbitrary textual stories with visual content.

APA, Harvard, Vancouver, ISO, and other styles

26

Yusoff, Yusseri. "Automatic detection of shot boundaries in digital video." Thesis, University of Surrey, 2002. http://epubs.surrey.ac.uk/843079/.

Full text

Abstract:

This thesis describes the implementation of automatic shot boundary detection algorithms for the detection of cuts and gradual transitions in digital video sequences. The objective was to develop a fully automatic video segmentation system as a pre-processing step for video database retrieval management systems as well as other applications which has large video sequences as part of their systems. For die detection of cuts, we begin by looking at a set of baseline algorithms that look into measuring specific features of video images and calculating the dissimilarity of the measures between frames in the video sequence. We then propose two different approaches and compare them against the set of baseline algorithms. These approaches are themselves built upon the base set of algorithms. Observing that the baseline algorithms initially use hard thresholds to determine shot boundaries, we build Receiver Operating Characteristic (ROC) curves to plot the characteristics of the algorithms when varying the thresholds. In the first approach, we look into combining the multiple algorithms in such a way that as a collective, the detection of cuts are improved. The results of the fusion are then compared against the baseline algorithms on the ROC curve. For the second approach, we look into having adaptive thresholds for the baseline algorithms. A selection of adaptive thresholding methods were applied to the data set and compared with the baseline algorithms that are using hard thresholds. In the case of gradual transition detection, an application of a filtering technique used to detect ramp edges in images is adapted for use in video sequences. The approach is taken by starting with the observation that shot boundaries represent edges in time, with cuts being sharp edges and gradual transitions closely approximating ramp edges. The methods that we propose reflect our concentration on producing a reliable and efficient shot boundary detection mechanism. In each instance, be it for cuts or gradual transitions, we tested our algorithms on a comprehensive set of video sequences, containing a variety of content and obtained highly competitive results.

APA, Harvard, Vancouver, ISO, and other styles

27

Hopfgartner, Frank. "Personalised video retrieval : application of implicit feedback and semantic user profiles." Thesis, University of Glasgow, 2010. http://theses.gla.ac.uk/2132/.

Full text

Abstract:

A challenging problem in the user profiling domain is to create profiles of users of retrieval systems. This problem even exacerbates in the multimedia domain. Due to the Semantic Gap, the difference between low-level data representation of videos and the higher concepts users associate with videos, it is not trivial to understand the content of multimedia documents and to find other documents that the users might be interested in. A promising approach to ease this problem is to set multimedia documents into their semantic contexts. The semantic context can lead to a better understanding of the personal interests. Knowing the context of a video is useful for recommending users videos that match their information need. By exploiting these contexts, videos can also be linked to other, contextually related videos. From a user profiling point of view, these links can be of high value to recommend semantically related videos, hence creating a semantic-based user profile. This thesis introduces a semantic user profiling approach for news video retrieval, which exploits a generic ontology to put news stories into its context. Major challenges which inhibit the creation of such semantic user profiles are the identification of user's long-term interests and the adaptation of retrieval results based on these personal interests. Most personalisation services rely on users explicitly specifying preferences, a common approach in the text retrieval domain. By giving explicit feedback, users are forced to update their need, which can be problematic when their information need is vague. Furthermore, users tend not to provide enough feedback on which to base an adaptive retrieval algorithm. Deviating from the method of explicitly asking the user to rate the relevance of retrieval results, the use of implicit feedback techniques helps by learning user interests unobtrusively. The main advantage is that users are relieved from providing feedback. A disadvantage is that information gathered using implicit techniques is less accurate than information based on explicit feedback. In this thesis, we focus on three main research questions. First of all, we study whether implicit relevance feedback, which is provided while interacting with a video retrieval system, can be employed to bridge the Semantic Gap. We therefore first identify implicit indicators of relevance by analysing representative video retrieval interfaces. Studying whether these indicators can be exploited as implicit feedback within short retrieval sessions, we recommend video documents based on implicit actions performed by a community of users. Secondly, implicit relevance feedback is studied as potential source to build user profiles and hence to identify users' long-term interests in specific topics. This includes studying the identification of different aspects of interests and storing these interests in dynamic user profiles. Finally, we study how this feedback can be exploited to adapt retrieval results or to recommend related videos that match the users' interests. We analyse our research questions by performing both simulation-based and user-centred evaluation studies. The results suggest that implicit relevance feedback can be employed in the video domain and that semantic-based user profiles have the potential to improve video exploration.

APA, Harvard, Vancouver, ISO, and other styles

28

Wang, Shih-Han, and 王詩涵. "Video-based Clothing Retrieval." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/93002261881198212631.

Full text

Abstract:

碩士
國立臺灣大學
資訊網路與多媒體研究所
102
Nowadays, clothing retrieval becomes a thriving demand for online clothing shopping websites. Beyond keyword-based clothing search, image-based clothing retrieval has generated interest in recent research papers. It promotes more interesting clothing recommendation system and gives the possibility of improving identity or occupation recognition. In this paper, we present a brand-new video-based clothing retrieval system. We believe the system gives another intuitive clothing recommendation interface in a smart home with such an application scenario: one can select an impressive shot where the character is wearing a fascinating clothing by a TV remote control, and learn the clothing style from the character. However, there still are major challenges in this topic, such as human pose estimation and complex background between online shopping datasets, which often cause inaccurate retrieval results. Our research focuses on two issues here. First, we propose a human pose estimation mechanism with a video clip of frames for the refinement of inaccurate human pose. Second, we explore an automatic foreground segmentation method with "Grabcut" algorithm to tackle the complex background problem. In our experiments, we collect a few video clips and different kinds of online shopping datasets. The experimental results successfully demonstrate that our mechanism will improve the inaccurate pose estimation and can tackle the complex background problem.

APA, Harvard, Vancouver, ISO, and other styles

29

Lin, Chia-Hsuan, and 林家玄. "Video Database Retrieval System." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/39317009085440223435.

Full text

Abstract:

碩士
國立中山大學
資訊工程學系研究所
94
During the Digital Period, the more people using these digital video. When there are more and more users and amount of video data, the management of video data becomes a significant dimension during development. Therefore, there are more and more studying of accomplishing video database system, which provide users to search and get them. In this paper, a novel method for Video Scene Change Detection and video database retrieval is proposed. Uses Fractal orthonormal bases to guarantee the similar index has the similar image the characteristic union support vector clustering, splits a video into a sequence of shots, extracts a few representative frames(key-frames) to take the video database index from each shot. When image search compared to, according to MIL to pick up the characteristic, which images pursues the video database to have the similar characteristic, computation similar, makes the place output according to this.

APA, Harvard, Vancouver, ISO, and other styles

30

LU, KAI-HUI, and 陸凱暉. "color-based video retrieval." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/29683920492533261928.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Bai, Yannan. "Video analytics system for surveillance videos." Thesis, 2018. https://hdl.handle.net/2144/30739.

Full text

Abstract:

Developing an intelligent inspection system that can enhance the public safety is challenging. An efficient video analytics system can help monitor unusual events and mitigate possible damage or loss. This thesis aims to analyze surveillance video data, report abnormal activities and retrieve corresponding video clips. The surveillance video dataset used in this thesis is derived from ALERT Dataset, a collection of surveillance videos at airport security checkpoints. The video analytics system in this thesis can be thought as a pipelined process. The system takes the surveillance video as input, and passes it through a series of processing such as object detection, multi-object tracking, person-bin association and re-identification. In the end, we can obtain trajectories of passengers and baggage in the surveillance videos. Abnormal events like taking away other's belongings will be detected and trigger the alarm automatically. The system could also retrieve the corresponding video clips based on user-defined query.

APA, Harvard, Vancouver, ISO, and other styles

32

Su, Chih-Wen, and 蘇志文. "Content-based Video Retrieval Techniques for MPEG Video." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/9davy5.

Full text

Abstract:

博士
國立中央大學
資訊工程研究所
94
Gradual shot change detection is one of the most important research issues in the field of video indexing/retrieval. Among the numerous types of gradual transitions, the dissolve-type gradual transition is considered the most common one, but it is also the most difficult one to detect. In most of the existing dissolve detection algorithms, the false/miss detection problem caused by motion is very serious. In this thesis, we present a novel dissolve-type transition detection algorithm that can correctly distinguish dissolves from disturbance caused by motion. We carefully model a dissolve based on its nature and then use the model to filter out possible confusion caused by the effect of motion. Furthermore, we propose the use of motion vectors embedded in MPEG bitstreams to generate so-called “motion flows”, which are applied to perform quick video retrieval. By using the motion vectors directly, we do not need to consider the shape of a moving object and its corresponding trajectory. Instead, we simply “link” the local motion vectors across consecutive video frames to form motion flows, which are then annotated and stored in a video database. In the video retrieval phase, we propose a new matching strategy to execute the video retrieval task. Motions that do not belong to the mainstream motion flows are filtered out by our proposed algorithm. The retrieval process can be triggered by query-by-sketch (QBS) or query-by-example (QBE). The experiment results show that our method is indeed efficient and accurate in the video retrieval process.

APA, Harvard, Vancouver, ISO, and other styles

33

Chin, Kuo-Hao, and 秦國豪. "An integrated approach to video retrieval." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/98338076773433550159.

Full text

Abstract:

碩士
輔仁大學
資訊工程學系
93
In this paper, a novel approach is proposed for video clip (containing more than one shot) matching and retrieval. In contrast to the traditional key-frame based shot matching approach and frame sequence matching approach, our method analyzes all frames within a shot to fully exploit the spatio-temporal information, while generating only one representation for each visual feature used in each shot. The visual features utilized in our scheme are color fused with spatial distribution information, and motion. Video clip matching is performed at two levels, first at the shot level, then at the sequence level. A set of similarity measures are defined to evaluate the similarity between the query and the database video. Experimental results show that the proposed approach is effective and feasible in retrieving and ranking similar video clips with a variety of video contents. Our approach is also able to achieve a superior performance in comparison to a key-framed based retrieval system using only color histogram for feature representation and matching. Furthermore, to refine the search results, a technique called relevance feedback is implemented that takes into account the user’s own opinion on whether two clips are similar during the retrieval process. In some cases, improvement on the retrieval performance is observed.

APA, Harvard, Vancouver, ISO, and other styles

34

Yi-Chen, Chen, and 陳怡真. "Video Retrieval Based on Temporal Texture." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/68432539684728164592.

Full text

Abstract:

碩士
義守大學
資訊工程學系
92
It is well known that textural character play a very important role in human visual perception. Over the last few decades hundreds of papers have been published on the analysis of static textures but only a few of them studied the dynamical nature of textures. Textures evolving over time are called temporal textures and are very common in everyday life. The smoke flowing or the wavy water of a river is good examples of temporal textures. A temporal texture can be defined in terms of temporal-spatial pixel value interactions within digital video signals. In general, a video can be viewed as combination of different temporal texture segments. Therefore, the property of temporal textures plays a fundamental role in content-based video retrieval. In this thesis, we will focus on the temporal textures extraction, including the Directional Markov random field model of temporal texture and temporal texture features. The overall evaluation of the performance of video retrieval using temporal texture features will be studied.

APA, Harvard, Vancouver, ISO, and other styles

35

Chen, Chi-Horng, and 陳志弘. "Data Retrieval in Video on Demand." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/37440304698553521333.

Full text

Abstract:

碩士
國立中興大學
資訊科學學系
84
While the need for applications that access video data stored in digital forms is growing rapidly, video systems that retrieve hundreds of videos from disks are becoming increasingly important.A challenge for such systems is to devise schemes that distribute the workload evenly across disks and provide good response time to client requests for video data.Different from previous research that concentrates on specific-functions, what we consider in this thesis for interactive video browsing is a full-function system composed of various playout methods, buffer management, disk striping and disk scheduling.In buffer management, we not only consider various policies such as allocation, replacement, prefetching and load control, but also consider their interactions with the quality of services in both the client and server sites.In disk striping, we propose the frame object striping method to deal with distributing retrieval requests across disks evenly.In disk scheduling, we address the issue of disk throughput.Taking all these factors into consideration, a mathematical model is set up and analyzed. Equations are derived to compute the minimum buffer size required for smooth playback and to compute the extra buffer needed to support VCR-like functions.According to various quality of services, several equations are also derived to tune the buffer size.We observe that good throughput is not only based on the balance of disk workload with any browsing rate. Cooperating with disk scheduling, appropriate buffer sizes associated with different data access patterns can also achieve good disk throughput.In this case, the number of disks required for an access pattern is inversely proportional to the greatest common divisor between the total number of disks and browsing rate.Based on these analytical results, we may design an efficient data retrieval mechanism for video-on-demand systems.

APA, Harvard, Vancouver, ISO, and other styles

36

Weng, Pei-Chih, and 温培志. "Stroke-based Broadcast Basketball Video Retrieval." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/88379461752427588908.

Full text

Abstract:

碩士
國立交通大學
資訊科學與工程研究所
102
We present a stroke based system that allows users to retrieve basketball video clips easily and intuitively. In contrast to current retrieval systems, which mainly rely on key words, users can draw player trajectories on our defined basketball court coordinate and specify the corresponding events such as shot made or shot miss to provide a more specific searching condition and prevent unwanted results during retrieval. Considering players are perspectively projected on each video frame and cameras in broadcast videos are dynamic, in which cases the specified strokes and the extracted player trajectories are not comparable, we map player positions in each frame to the defined basketball court coordinate using camera calibration. To achieve a robust mapping, our system considers the whole video clip and reconstruct a panoramic basketball court, followed by rectifying the panoramic court to our defined court using a homography. While this reconstruction is able to map pixels from a video frame to our defined court coordinate, it also is able to map player trajectories between the two coordinate systems. To obtain the event of a video clip, we extract the game time using the optical character recognition and map it to the event logs defined in a play-by-play text that is available online. Thanks to these two types of semantic information, our system is very helpful to coaches and tremendous amount of spectators. The retrieved videos with the corresponding searching conditions shown in \ref{fig:cutin1}-\ref{fig:differentstroke}, and our accompanying video verify the feasibility of our technique.

APA, Harvard, Vancouver, ISO, and other styles

37

Lin, Fang-Ju, and 林芳如. "H.264 Based Entertainment Video Retrieval." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/96780571006938018308.

Full text

Abstract:

碩士
國立交通大學
多媒體工程研究所
95
In this thesis, we provide a video retrieval system based on H.264 video compression format to retrieve complete desired videos by a short video clip. Without completely decompressing a video stream, we parse the H.264 video stream once to extract two useful features, lumas and motions. On the process of luma feature extraction, the luma calibration is proposed and similar luma features are removed. On the process of motion feature extraction, a statistic scheme is proposed to extract and combine dominant object motions and camera motions. On the retrieval process, the most relevant results are returned to users by using these two features. Our testing videos are all entertainment TV programs.

APA, Harvard, Vancouver, ISO, and other styles

38

Yu, Chung-Yang, and 游重陽. "Video retrieval using 2D M-string." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/57175422747260556334.

Full text

Abstract:

碩士
淡江大學
資訊管理學系碩士班
98
In image database systems, Content-Based Image Retrieval (CBIR) is an important approach to image retrieval. How to use strings to express the spatial relationship between objects and how to perform the inference and similarity retrieval have been widely discussed. The concept of 2D B-string notation is used to indicate the moving spatial relationship between objects in a video. Each object is represented as a focal point, and the point is marked in the "initiation position" and "end position" of each object of the video. A section from the point in the "initiation position" to the point in the "end position" is defined as the “displacement” of the object in the video. The 2D M-string is created based on the displacement of the objects in the video. The 2D M-String to spatial reasoning and search using the index structure can be a lot less similar to the filter clips to reduce the number of string matching and the efficient retrieval of similar video can be achieved.

APA, Harvard, Vancouver, ISO, and other styles

39

Huang, Wei-Hsun, and 黃煒勛. "Video Retrieval based on spatial events." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/53847328751206122468.

Full text

Abstract:

碩士
淡江大學
資訊管理學系碩士班
100
A video is composed of a sequence of frames. A frame may contain multiple objects. In a single frame, it contains the spatial relations between objects in this frame. It does not have information about the change of the spatial relations between objects in the video. In this paper, we define the occurrence of change of the spatial relations between objects between two adjacent frames as the spatial event and propose the spatial event string to represent the spatial event. Hence, the change of the spatial relations between objects can be derived from this string and the video query based on spatial events can be processed efficiently. We would propose the algorithm for the generation of the spatial event string and the way to process the video query based on spatial events. Generally speaking, the number the objects involved in spatial events is relatively small. Since the spatial event string contains only the objects involved in spatial events. Compared with other strings, the length of the string can be reduced dramatically. Consequently, the requirement of the storage and the process time of the string can be reduced.

APA, Harvard, Vancouver, ISO, and other styles

40

Chen, Duan-Yu, and 陳敦裕. "Towards High-Level Content-Based Video Retrieval and Video Structuring." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/22704705540427885349.

Full text

Abstract:

博士
國立交通大學
資訊工程系所
93
With the increasing digital videos in education, entertainment and other multimedia applications, there is an urgent demand for tools that allow an efficient way for users to acquire desired video data. Content-based searching, browsing and retrieval is more natural, friendly and semantically meaningful to users. With the technique of video compression getting mature, lots of videos are being stored in compressed form and accordingly more and more researches focus on the feature extractions in compressed videos especially in MPEG format. This thesis aims to investigate high-level semantic video features in compressed domain for efficient video retrieval and video browsing. We propose an approach for video abstraction to generate semantically meaningful video clips and associated metadata. Based on the concept of long-term consistency of spatial-temporal relationship between objects in consecutive P-frames, the algorithm of multi-object tracking is designed to locate the objects and to generate the trajectory of each object without size constraint. Utilizing the object trajectory coupled with domain knowledge, the event inference module detects and identifies the events in the application of tennis sports. Consequently, the event information and metadata of associated video clips are extracted and the abstraction of video streams is accomplished. A novel mechanism is proposed to automatically parse sports videos in compressed domain and then to construct a concise table of video content employing the superimposed closed captions and the semantic classes of video shots. The efficient approach of closed caption localization is proposed to first detect caption frames in meaningful shots. Then caption frames instead of every frame are selected as targets for detecting closed captions based on long-term consistency without size constraint. Besides, in order to support discriminate captions of interest automatically, a novel tool – font size detector is proposed to recognize the font size of closed captions using compressed data in MPEG videos. For effective video retrieval, we propose a high-level motion activity descriptor, object-based transformed 2D-histogram (T2D-Histogram), which exploits both spatial and temporal features to characterize video sequences in a semantics-based manner. The Discrete Cosine Transform (DCT) is applied to convert the object-based 2D-histogram sequences from the time domain to the frequency domain. Using this transform, the original high-dimensional time domain features used to represent successive frames are significantly reduced to a set of low-dimensional features in frequency domain. The energy concentration property of DCT allows us to use only a few DCT coefficients to effectively capture the variations of moving objects. Having the efficient scheme for video representation, one can perform video retrieval in an accurate and efficient way. Furthermore, we propose a high-level compact motion-pattern descriptor, temporal motion intensity of moving blobs (MIMB) moments, which exploits both spatial invariants and temporal features to characterize video sequences. The energy concentration property of DCT allows us to use only a few DCT coefficients to precisely capture the variations of moving blobs. Compared to the motion activity descriptors, RLD and SAH, of MPEG-7, the proposed descriptor yield 40% and 21 % average performance gains over RLD and SAH, respectively. Comprehensive experiments have been conducted to assess the performance of the proposed methods. The empirical results show that these methods outperform state-of-the-art methods with respective various datasets of different characteristics.

APA, Harvard, Vancouver, ISO, and other styles

41

邱敬昌. "Compressed-domain video object extraction for content-based video retrieval." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/89261594499871864116.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Ho, Yu-Hsuan, and 何宥萱. "Key-Frame Extraction for Video Summarization and Shot-Based Video Retrieval." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/73943136238044640470.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
92
In this paper, we present an adaptive rate-constrained key-frame selection scheme for channel-aware realtime video streaming and shot-based video retrieval. First, the streaming server dynamically determines the target number of key-frames by estimating the channel conditions according to the feedback information. Under the constraint of the target key-frame number, a two-step sequential key-frame selection scheme is adopted to select the target number of key-frames by first finding the optimal allocation among the video shots in a video clip, and then selecting most representative key-frames in each shot according to the allocation to guide the temporal-downscaling transcoding. After extracting the key-frames, we propose a multi-pass video retrieval using spatio-temporal statistics information. In the first-pass, the probability distributions of object motion for each shot of the query video clip are extracted and then are compared with the probability of the shots in the database by using the Bhattacharyya distance. In the second-pass, two consecutive shots are employed to the introduction of the “causality” effect. Finally, in the refinement-pass, we extract one key-frame from each shot using our key-frame selection method, and calculate the color histogram of each key-frame. Then we use the Bhattacharya distance to compare the similarity of the two color histograms of key-frames and cumulate the second-stage distance to be the similarity of two video shots. Without respect to the two-step key-frame selection or multi-pass video retrieval, our experimental results show that the proposed methods are efficient and satisfactory.

APA, Harvard, Vancouver, ISO, and other styles

43

Teng, Shang-Ju, and 鄧尚儒. "Motion Trajectory Based Video Indexing and Retrieval." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/07158040655742557732.

Full text

Abstract:

碩士
國立清華大學
資訊工程學系
90
This thesis presents a technique to efficiently index and retrieve video clips in terms of motion-trajectory-based similarity. We describe a motion trajectory in three representations: the horizontal and vertical movements of the trajectory, and the motion trail that indicates shape of the trajectory. Each representation is approximated by a polynomial function, and the polynomial coefficients are then indexed for retrieval. To measure the matching distance, we combine different spatio-temporal characteristics to provide flexible retrieval processes. In addition, we also proposed a multiscale mode to improve the retrieval efficiency. A unified framework is developed to deal with various query types: query-by-example, query-by-sketch, and query-by-specification. We have performed many experiments to confirm the effectiveness and efficiency of our method. Experimental results indicate satisfactory performance of our work.

APA, Harvard, Vancouver, ISO, and other styles

44

Wang, Jhih-Huang, and 王志煌. "Video Retrieval Based on Color Center Descriptor." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/60820913335568072148.

Full text

Abstract:

碩士
國立台北師範學院
自然科學教育研究所
93
In the previous paper, a fully digital course material was designed to combine the TV programs and video database distributed through the internet, and real activities conducted at each class. This research found that children’s learning promoted through the course. With the rapid development of internet technology, the amount of video is getting larger and larger. It is necessary for human beings using an efficient retrieval tool to find out the video we need. Mpeg-7, formally named “Multimedia Content Descriptor Interface”, provides a comprehensive set of audiovisual Descriptor Tools to extract feature from audiovisual content and describe audiovisual information. And color histogram is used to describe the color feature of videos. A new descriptor with the concept of mass center in Physics is proposed in this paper to promote the performance of video retrieval. The experiment result shows that the method proposed in this paper is better than the other methods proposed in previous papers.

APA, Harvard, Vancouver, ISO, and other styles

45

Lin, Chih-long, and 林志隆. "Content-based Video Retrieval with Multi Features." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/ed84t6.

Full text

Abstract:

碩士
國立臺灣科技大學
電機工程系
102
With the advance of multimedia codec technology and communications, multimedia communications become one of the major information media with the aides of internet prevalence. Under this circumstance, image and video data over the Internet contribute to the sea of media and how to search user desired media contents from the sea of media becomes important. Content-Based Video Retrieval (CBVR) methods have been proposed to search user interested video clips, precisely and quickly. Among these researches, extracting image features for similarity measurement is widely adopted. However, adopting only one kind of feature to describe video contents cannot provide satisfactory retrieval results. In general, more than one kind of image/video features are extracted to for efficient video retrieval. How to efficiently integrate different kinds of image/video features is critical and challenging in improving the video retrieval performance. In this thesis, we proposed to integrate color, texture and SIFT-BOW (Bag of Word) image features to describe one video clip. These three features not only can describe the global image feature, but also local region ones. In our experiments, the color histogram difference is used to measure similarity for video scene cut. These video scene cuts, video clips, are used as the basic media unit for description and retrieval. The average of image features within one media unit is used as the representative feature for the video clip. To perform retrieval, the feature of one query image/video is extracted and its similarity to each representative feature of one video clip in a database is calculated to perform similarity ranking. For comparisons, the video retrieval performance that adopts only one feature is implemented. In addition, the one proposed by Y. Deng [10] that adopts more than one feature for video retrieval is also carried out for comparisons. Experiments showed that the proposed CBVR method outperforms the previous method by 38.7% in the PR rate. Performing CBVR by multiple features also improves on the PR performance as compared to retrieval by single feature.

APA, Harvard, Vancouver, ISO, and other styles

46

Lai, Yuan-hao, and 賴沅壕. "Video Object Retrieval by Trajectory and Appearance." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/26725122356557592907.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊管理系
101
The prevalence of video recording capability, either on surveillance or mobile devices, has contributed to the popularity of video data. As a result, video management has become relatively more important than before, and in particular, video retrieval has been one of the main issues in this regard. Traditional video retrieval systems take texts as the inputs to look for similar information from the title, annotation or embedded textual data of a video, in a way that is very similar to the keyword search adopted by a common search engine. However, the lack of visual information specification during a search often makes the result rather inaccurate or even useless. For this reason, video retrieval systems with inputs being images or videos have also been proposed; nevertheless, the associated ambiguity and complexity have made the implementation of such systems relatively difficult, and thus not as successful as desired. To address this, in this thesis, we propose to perform a video retrieval of a desired object through the inputs of its trajectory and/or appearance, together with the help of a 3D graphical user interface for more intuitive interactions, more satisfactory results can be achieved. And we firmly believe that such a framework could serve as the foundation for behavior analysis to be used in many surveillance systems.

APA, Harvard, Vancouver, ISO, and other styles

47

Chang, Hao-Wei, and 張皓崴. "A Study on Content-Based Video Retrieval." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/01089217808521692359.

Full text

Abstract:

碩士
國立交通大學
資訊科學系
90
In this paper, a content-based video retrieval method without using key-frame will be proposed. Unlike the key-frame based approach, the proposed method uses the whole information of a shot instead of selecting several key-frames to represent a shot. This method is based on the concept of the primitives of color moments and the dominant colors to extract features. To extract the primitives of color moments, first each frame in a shot is divided into several blocks. Then, the color moments of all blocks are extracted and clustered into several classes. The mean moments of each class are considered as a primitive of the shot. To extract the dominant colors, all pixel’s color in a shot are clustered into several classes, and the center of the colors in each class is treat as a dominant color. After extracting the feature vectors for each shot, we will propose two measures to compute the similarity between two different shots using primitives of color moments and dominant colors as features, respectively. Furthermore, since there is no feature that is proper for all kinds of shots, a relevance feedback algorithm is also provided to automatically determine the best method according to the user’s response.

APA, Harvard, Vancouver, ISO, and other styles

48

Lee, Gin Song, and 李金松. "Semantic Video Model for Content-based Retrieval." Thesis, 1998. http://ndltd.ncl.edu.tw/handle/76910006565384416570.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Liu, Chih-Chin, and 劉志俊. "Content-Based Video and Music Data Retrieval." Thesis, 1998. http://ndltd.ncl.edu.tw/handle/29623888052889787839.

Full text

Abstract:

博士
國立清華大學
資訊工程學系
86
AbstractIn this dissertation, we discuss important issues in content-based video and music data retrieval. First, we describe features used to model the content of image, video and music data. Based on the object model, we propose a multimedia framework and an object-level spatial/temporal model to represent the spatial/temporal relationships between media objects. Three new types of aggregation relationships composed of the composition, temporal, and spatial relationships are considered in the framework. To support content-based data retrieval, we propose a multimedia query language and two kinds of query interfaces for users to specify content-based queries. Second, since many content-based multimedia data retrieval problems can be transformed into the near neighbor searching problem in a multidimensional feature space, an efficient near neighbor searching algorithm is needed when developing a multimedia database system. We propose an approach to efficiently solve the near neighbor searching problem. In this approach, along each dimension an index is constructed according to the values of the feature points of the multimedia objects. A user can pose a content-based query by specifying a multimedia query example and a similarity threshold. The specified query example will be transformed into a query point in the multi- dimensional feature space. The possible result points in each dimension are then retrieved by searching the value of the query point in the corresponding dimension. The sets of the possible result points are merged one by one by removing the points which are not within the query radius. The result points and their distances from the query point form the answer of the query. Third, we propose a video query model based on the content of video and iconic indexing. The notion of two-dimensional strings is extended to three-dimensional strings (3D-Strings) for representing the spatial and temporal relationships among the symbols in both a video and a video query. The problem of video query processing is then transformed into a problem of three- dimensional pattern matching. To efficiently match the 3D- Strings, a data structure called 3D-List and its related algorithms are proposed. In this approach, the symbols of a video in the video database are retrieved from the video index and organized as a 3D-List according to the 3D-String of the video query. The related algorithms are then applied on the 3D- List to determine whether this video is an answer to the video query. Fourth, we propose an approach for content-based music data retrieval. In this approach, thematic feature strings, such as melody strings, rhythm strings, and chord strings are extracted from the original music objects and treated as the meta data to represent their contents. The problem of content- based music data retrieval is then transformed into the string matching problem. A new approximate string matching algorithm is proposed for content-based music data retrieval.

APA, Harvard, Vancouver, ISO, and other styles

50

Yu, Shang-Li, and 于尚立. "Motion-based Video Retrieval by Trajectory Matching." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/22209157147694698298.

Full text

Abstract:

碩士
元智大學
電機工程學系
91
This thesis proposes a motion-based video retrieval system to retrieve desired videos from a video database through trajectory matching. First, in order to extract the trajectories of moving objects, a camera compensation method is proposed for estimating possible camera motions from time-varying backgrounds. The thesis uses an affine transform to model all possible global camera motions. Then, through feature extraction, feature matching, and a voting technique, desired camera motions can be accurately estimated from pair of image frames. Thus, the trajectories of moving objects can be easily found by image differencing and trajectory tracking. Once the trajectories of moving objects have been extracted, for each subsequence of videos, a set of control points are then sampled and recorded for further indexing. Before retrieving, all the sets of control points will be first transformed into different Bezier curves. Then, similarity comparisons of different video contents can be performed by comparing the sampled points extracted along the Bezier curves. According to the Bezier representation, a novel indexing framework is then proposed to retrieve desired video sequences from video databases no matter how different scaling and translations exist in these sequences. In addition, the proposed system has good properties in dealing with partial matching of curves. Thus, even though an incomplete query trajectory is given, all desired video sequences can be very accurately retrieved and returned to users. A great variety of experiments were conducted to verify the efficiency, effectiveness, and robustness of the proposed system. Experimental results have proved the superiority of our proposed method.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Video retrieval'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles