Log in

Relevant bibliographies by topics / Video retrieval

Academic literature on the topic 'Video retrieval'

Author: Grafiati

Published: 4 June 2021

Last updated: 14 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Video retrieval.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Video retrieval"

1

Lin, Lin, and Mei-Ling Shyu. "Correlation-Based Ranking for Large-Scale Video Concept Retrieval." International Journal of Multimedia Data Engineering and Management 1, no. 4 (October 2010): 60–74. http://dx.doi.org/10.4018/jmdem.2010100105.

Full text

Abstract:

Motivated by the growing use of multimedia services and the explosion of multimedia collections, efficient retrieval from large-scale multimedia data has become very important in multimedia content analysis and management. In this paper, a novel ranking algorithm is proposed for video retrieval. First, video content is represented by the global and local features and second, multiple correspondence analysis (MCA) is applied to capture the correlation between video content and semantic concepts. Next, video segments are scored by considering the features with high correlations and the transaction weights converted from correlations. Finally, a user interface is implemented in a video retrieval system that allows the user to enter his/her interested concept, searches videos based on the target concept, ranks the retrieved video segments using the proposed ranking algorithm, and then displays the top-ranked video segments to the user. Experimental results on 30 concepts from the TRECVID high-level feature extraction task have demonstrated that the presented video retrieval system assisted by the proposed ranking algorithm is able to retrieve more video segments belonging to the target concepts and to display more relevant results to the users.

APA, Harvard, Vancouver, ISO, and other styles

2

Song, Yaguang, Junyu Gao, Xiaoshan Yang, and Changsheng Xu. "Learning Hierarchical Video Graph Networks for One-Stop Video Delivery." ACM Transactions on Multimedia Computing, Communications, and Applications 18, no. 1 (January 31, 2022): 1–23. http://dx.doi.org/10.1145/3466886.

Full text

Abstract:

The explosive growth of video data has brought great challenges to video retrieval, which aims to find out related videos from a video collection. Most users are usually not interested in all the content of retrieved videos but have a more fine-grained need. In the meantime, most existing methods can only return a ranked list of retrieved videos lacking a proper way to present the video content. In this paper, we introduce a distinctively new task, namely One-Stop Video Delivery (OSVD) aiming to realize a comprehensive retrieval system with the following merits: it not only retrieves the relevant videos but also filters out irrelevant information and presents compact video content to users, given a natural language query and video collection. To solve this task, we propose an end-to-end Hierarchical Video Graph Reasoning framework (HVGR) , which considers relations of different video levels and jointly accomplishes the one-stop delivery task. Specifically, we decompose the video into three levels, namely the video-level, moment-level, and the clip-level in a coarse-to-fine manner, and apply Graph Neural Networks (GNNs) on the hierarchical graph to model the relations. Furthermore, a pairwise ranking loss named Progressively Refined Loss is proposed based on prior knowledge that there is a relative order of the similarity of query-video, query-moment, and query-clip due to the different granularity of matched information. Extensive experimental results on benchmark datasets demonstrate that the proposed method achieves superior performance compared with baseline methods.

APA, Harvard, Vancouver, ISO, and other styles

3

Liu, Xiaoxi, Ju Liu, Lingchen Gu, and Yannan Ren. "Keyframe-Based Vehicle Surveillance Video Retrieval." International Journal of Digital Crime and Forensics 10, no. 4 (October 2018): 52–61. http://dx.doi.org/10.4018/ijdcf.2018100104.

Full text

Abstract:

This article describes how due to the diversification of electronic equipment in public security forensics, vehicle surveillance video as a burgeoning way attracts us attention. The vehicle surveillance videos contain useful evidence, and video retrieval can help us find evidence contained in them. In order to get the evidence videos accurately and effectively, a convolution neural network (CNN) is widely applied to improve performance in surveillance video retrieval. In this article, it is proposed that a vehicle surveillance video retrieval method with deep feature derived from CNN and with iterative quantization (ITQ) encoding, when given any frame of a video, it can generate a short video which can be applied to public security forensics. Experiments show that the retrieved video can describe the video content before and after entering the keyframe directly and efficiently, and the final short video for an accident scene in the surveillance video can be regarded as forensic evidence.

APA, Harvard, Vancouver, ISO, and other styles

4

Liu, Liu, Jiangtong Li, Li Niu, Ruicong Xu, and Liqing Zhang. "Activity Image-to-Video Retrieval by Disentangling Appearance and Motion." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 3 (May 18, 2021): 2145–53. http://dx.doi.org/10.1609/aaai.v35i3.16312.

Full text

Abstract:

With the rapid emergence of video data, image-to-video retrieval has attracted much attention. There are two types of image-to-video retrieval: instance-based and activity-based. The former task aims to retrieve videos containing the same main objects as the query image, while the latter focuses on finding the similar activity. Since dynamic information plays a significant role in the video, we pay attention to the latter task to explore the motion relation between images and videos. In this paper, we propose a Motion-assisted Activity Proposal-based Image-to-Video Retrieval (MAP-IVR) approach to disentangle the video features into motion features and appearance features and obtain appearance features from the images. Then, we perform image-to-video translation to improve the disentanglement quality. The retrieval is performed in both appearance and video feature spaces. Extensive experiments demonstrate that our MAP-IVR approach remarkably outperforms the state-of-the-art approaches on two benchmark activity-based video datasets.

APA, Harvard, Vancouver, ISO, and other styles

5

Waykar, Sanjay B., and C. R. Bharathi. "Multimodal Features and Probability Extended Nearest Neighbor Classification for Content-Based Lecture Video Retrieval." Journal of Intelligent Systems 26, no. 3 (July 26, 2017): 585–99. http://dx.doi.org/10.1515/jisys-2016-0041.

Full text

Abstract:

AbstractDue to the ever-increasing number of digital lecture libraries and lecture video portals, the challenge of retrieving lecture videos has become a very significant and demanding task in recent years. Accordingly, the literature presents different techniques for video retrieval by considering video contents as well as signal data. Here, we propose a lecture video retrieval system using multimodal features and probability extended nearest neighbor (PENN) classification. There are two modalities utilized for feature extraction. One is textual information, which is determined from the lecture video using optical character recognition. The second modality utilized to preserve video content is local vector pattern. These two modal features are extracted, and the retrieval of videos is performed using the proposed PENN classifier, which is the extension of the extended nearest neighbor classifier, by considering the different weightages for the first-level and second-level neighbors. The performance of the proposed video retrieval is evaluated using precision, recall, and F-measure, which are computed by matching the retrieved videos and the manually classified videos. From the experimentation, we proved that the average precision of the proposed PENN+VQ is 78.3%, which is higher than that of the existing methods.

APA, Harvard, Vancouver, ISO, and other styles

6

Ke, Wanli. "Detection of Shot Transition in Sports Video Based on Associative Memory Neural Network." Wireless Communications and Mobile Computing 2022 (February 28, 2022): 1–8. http://dx.doi.org/10.1155/2022/7862343.

Full text

Abstract:

Users must quickly and effectively classify, browse, and retrieve videos due to the explosive growth of video data. A variety of shots make up the video data stream. The most important technology in video retrieval is shot detection, which can fundamentally solve many problems, resulting in improved detection effects and even directly affecting video retrieval performance. This paper investigates the shot transition detection algorithm in digital video live broadcasts based on sporting events. To solve the problem of shot transition detection using a single training sample, an AMNN (Associative Memory Neural Network) model with online learning ability is proposed. Experiments on a large football video data set show that this algorithm detects shear and gradual change better than existing algorithms and meets the application requirements of sports video retrieval in most cases.

APA, Harvard, Vancouver, ISO, and other styles

7

Xu, Ruicong, Li Niu, Jianfu Zhang, and Liqing Zhang. "A Proposal-Based Approach for Activity Image-to-Video Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12524–31. http://dx.doi.org/10.1609/aaai.v34i07.6941.

Full text

Abstract:

Activity image-to-video retrieval task aims to retrieve videos containing the similar activity as the query image, which is a challenging task because videos generally have many background segments irrelevant to the activity. In this paper, we utilize R-C3D model to represent a video by a bag of activity proposals, which can filter out background segments to some extent. However, there are still noisy proposals in each bag. Thus, we propose an Activity Proposal-based Image-to-Video Retrieval (APIVR) approach, which incorporates multi-instance learning into cross-modal retrieval framework to address the proposal noise issue. Specifically, we propose a Graph Multi-Instance Learning (GMIL) module with graph convolutional layer, and integrate this module with classification loss, adversarial loss, and triplet loss in our cross-modal retrieval framework. Moreover, we propose geometry-aware triplet loss based on point-to-subspace distance to preserve the structural information of activity proposals. Extensive experiments on three widely-used datasets verify the effectiveness of our approach.

APA, Harvard, Vancouver, ISO, and other styles

8

Chen, Hanqing, Chunyan Hu, Feifei Lee, Chaowei Lin, Wei Yao, Lu Chen, and Qiu Chen. "A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval." Sensors 21, no. 9 (April 29, 2021): 3094. http://dx.doi.org/10.3390/s21093094.

Full text

Abstract:

Recently, with the popularization of camera tools such as mobile phones and the rise of various short video platforms, a lot of videos are being uploaded to the Internet at all times, for which a video retrieval system with fast retrieval speed and high precision is very necessary. Therefore, content-based video retrieval (CBVR) has aroused the interest of many researchers. A typical CBVR system mainly contains the following two essential parts: video feature extraction and similarity comparison. Feature extraction of video is very challenging, previous video retrieval methods are mostly based on extracting features from single video frames, while resulting the loss of temporal information in the videos. Hashing methods are extensively used in multimedia information retrieval due to its retrieval efficiency, but most of them are currently only applied to image retrieval. In order to solve these problems in video retrieval, we build an end-to-end framework called deep supervised video hashing (DSVH), which employs a 3D convolutional neural network (CNN) to obtain spatial-temporal features of videos, then train a set of hash functions by supervised hashing to transfer the video features into binary space and get the compact binary codes of videos. Finally, we use triplet loss for network training. We conduct a lot of experiments on three public video datasets UCF-101, JHMDB and HMDB-51, and the results show that the proposed method has advantages over many state-of-the-art video retrieval methods. Compared with the DVH method, the mAP value of UCF-101 dataset is improved by 9.3%, and the minimum improvement on JHMDB dataset is also increased by 0.3%. At the same time, we also demonstrate the stability of the algorithm in the HMDB-51 dataset.

APA, Harvard, Vancouver, ISO, and other styles

9

Gu, Lingchen, Ju Liu, and Aixi Qu. "Performance Evaluation and Scheme Selection of Shot Boundary Detection and Keyframe Extraction in Content-Based Video Retrieval." International Journal of Digital Crime and Forensics 9, no. 4 (October 2017): 15–29. http://dx.doi.org/10.4018/ijdcf.2017100102.

Full text

Abstract:

The advancement of multimedia technology has contributed to a large number of videos, so it is important to know how to retrieve information from video, especially for crime prevention and forensics. For the convenience of retrieving video data, content-based video retrieval (CBVR) has got great publicity. Aiming at improving the retrieval performance, we focus on the two key technologies: shot boundary detection and keyframe extraction. After being compared with pixel analysis and chi-square histogram, histogram-based method is chosen in this paper. Then we combine it with adaptive threshold method and use HSV color space to get the histogram. For keyframe extraction, four methods are analyzed and four evaluation criteria are summarized, both objective and subjective, so the opinion is finally given that different types of keyframe extraction methods can be used for varied types of videos. Then the retrieval can be based on keyframes, simplifying the process of video investigation, and helping criminal investigation personnel to improve work efficiency.

APA, Harvard, Vancouver, ISO, and other styles

10

Patil, Sheetal Deepak. "Content Based Image and Video Retrieval A Compressive Review." International Journal of Engineering and Advanced Technology 10, no. 5 (June 30, 2021): 243–47. http://dx.doi.org/10.35940/ijeat.e2783.0610521.

Full text

Abstract:

Content-based image retrieval is quickly becoming the most common method of searching vast databases for images, giving researchers a lot of room to develop new techniques and systems. Likewise, another common application in the field of computer vision is content-based visual information retrieval. For image and video retrieval, text-based search and Web-based image reranking have been the most common methods. Though Content Based Video Systems have improved in accuracy over time, they still fall short in interactive search. The use of these approaches has exposed shortcomings such as noisy data and inaccuracy, which often result in the showing of irrelevant images or videos. The authors of the proposed study integrate image and visual data to improve the precision of the retrieved results for both photographs and videos. In response to a user's query, this study investigates alternative ways for fetching high-quality photos and related videos.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Video retrieval"

1

Pickering, Marcus Jerome. "Video retrieval and summarisation." Thesis, Imperial College London, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.411790.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chen, Juan. "Content-based Digital Video Processing. Digital Videos Segmentation, Retrieval and Interpretation." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4256.

Full text

Abstract:

Recent research approaches in semantics based video content analysis require shot boundary detection as the first step to divide video sequences into sections. Furthermore, with the advances in networking and computing capability, efficient retrieval of multimedia data has become an important issue. Content-based retrieval technologies have been widely implemented to protect intellectual property rights (IPR). In addition, automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this thesis, a paradigm is proposed to segment, retrieve and interpret digital videos. Five algorithms are presented to solve the video segmentation task. Firstly, a simple shot cut detection algorithm is designed for real-time implementation. Secondly, a systematic method is proposed for shot detection using content-based rules and FSM (finite state machine). Thirdly, the shot detection is implemented using local and global indicators. Fourthly, a context awareness approach is proposed to detect shot boundaries. Fifthly, a fuzzy logic method is implemented for shot detection. Furthermore, a novel analysis approach is presented for the detection of video copies. It is robust to complicated distortions and capable of locating the copy of segments inside original videos. Then, iv objects and events are extracted from MPEG Sequences for Video Highlights Indexing and Retrieval. Finally, a human fighting detection algorithm is proposed for movie annotation.

APA, Harvard, Vancouver, ISO, and other styles

3

Faichney, Jolon. "Content-Based Retrieval of Digital Video." Thesis, Griffith University, 2005. http://hdl.handle.net/10072/365697.

Full text

Abstract:

In the next few years consumers will have access to large amounts of video and image data either created by themselves with digital video and still cameras or by having access to other image and video content electronically. Existing personal computer hardware and software has not been designed to manage large quantities of multimedia content. As a result, research in the area of content-based video retrieval (CBVR) has been underway for the last fifteen years. This research aims to improve CBVR by providing an accurate and reliable shape-colour representation and by providing a new 3D user interface called DomeWorld for the efficient browsing of large video databases. Existing feature extraction techniques designed for use in large databases are typically simple techniques as they must conform to the limited processing and storage constraints that are exhibited by large scale databases. Conversely, more complex feature extraction techniques provide higher level descriptions of the underlying data but are time consuming and require large amounts of storage making them less useful for large databases. In this thesis a technique for medium to high level shape representation is presented that exhibits efficient storage and query performance. The technique uses a very accurate contour detection system that incorporates a new asymmetry edge detector which is shown to perform better than other contour detection techniques combined with a new summarisation technique to efficiently store contours. In addition, contours are represented by histograms further reducing space requirements and increasing query performance. A new type of histogram is introduced called the fuzzy histogram and is applied to content-based retrieval systems for the first time. Fuzzy histograms improve the ranking of query results over non-fuzzy techniques especially in low bin-count histogram configurations. The fuzzy contour histogram approach is compared with an exhaustive contour comparison technique and is found to provide equivalent or better results. A number of colour distribution representation techniques were investigated for integration with the contour histogram and the fuzzy HSV histogram was found to provide the best performance. When the colour and contour histograms were integrated less overall bins were required as each histogram compensates for the other’s weaknesses. The result is that only a quarter of the bins were required than either colour or contour histogram alone further reducing query times and storage requirements. This research also improves the user experience with a new user interface called DomeWorld that uses three-dimensional translucent domes. Existing user interfaces are either designed for image databases, for browsing videos, or for browsing large non-multimedia data sets. DomeWorld is designed to be able to browse both image and video databases through a number of innovative techniques including hierarchical clustering, radial space-filling layout of nodes, three-dimensional presentation, and translucent domes that allow the hierarchical nature of the data to be viewed whilst also seeing the relationship between child nodes a number of levels deep. A taxonomy of existing image, video, and large data set user interfaces is presented and the proposed user interface is evaluated within the framework. It is found that video database user interfaces have four requirements: context and detail, gisting, clustering, and integration of video and images. None of the 27 evaluated user interfaces satisfy all four requirements. The DomeWorld user interface is designed to satisfy all of the requirements and presents a step forward in CBVR user interaction. This thesis investigates two important areas of CBVR, structural indexing and user interaction, and presents techniques which advance the field. These two areas will become very important in the future when users must access and manage large collections of image and video content.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information Technology
Science, Environment, Engineering and Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

4

Banda, Nagamani. "Adaptive video segmentation." Morgantown, W. Va. : [West Virginia University Libraries], 2004. https://etd.wvu.edu/etd/controller.jsp?moduleName=documentdata&jsp%5FetdId=3520.

Full text

Abstract:

Thesis (M.S.)--West Virginia University, 2004.
Title from document title page. Document formatted into pages; contains vi, 52 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 50-52).

APA, Harvard, Vancouver, ISO, and other styles

5

Vrochidis, Stefanos. "Interactive video retrieval using implicit user feedback." Thesis, Queen Mary, University of London, 2013. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8729.

Full text

Abstract:

In the recent years, the rapid development of digital technologies and the low cost of recording media have led to a great increase in the availability of multimedia content worldwide. This availability places the demand for the development of advanced search engines. Traditionally, manual annotation of video was one of the usual practices to support retrieval. However, the vast amounts of multimedia content make such practices very expensive in terms of human effort. At the same time, the availability of low cost wearable sensors delivers a plethora of user-machine interaction data. Therefore, there is an important challenge of exploiting implicit user feedback (such as user navigation patterns and eye movements) during interactive multimedia retrieval sessions with a view to improving video search engines. In this thesis, we focus on automatically annotating video content by exploiting aggregated implicit feedback of past users expressed as click-through data and gaze movements. Towards this goal, we have conducted interactive video retrieval experiments, in order to collect click-through and eye movement data in not strictly controlled environments. First, we generate semantic relations between the multimedia items by proposing a graph representation of aggregated past interaction data and exploit them to generate recommendations, as well as to improve content-based search. Then, we investigate the role of user gaze movements in interactive video retrieval and propose a methodology for inferring user interest by employing support vector machines and gaze movement-based features. Finally, we propose an automatic video annotation framework, which combines query clustering into topics by constructing gaze movement-driven random forests and temporally enhanced dominant sets, as well as video shot classification for predicting the relevance of viewed items with respect to a topic. The results show that exploiting heterogeneous implicit feedback from past users is of added value for future users of interactive video retrieval systems.

APA, Harvard, Vancouver, ISO, and other styles

6

Zhang, Lelin. "Scalable Content-Based Image and Video Retrieval." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/15439.

Full text

Abstract:

The popularity of the Internet and portable image capturing devices brings in unprecedented amount of images and videos. Content-based visual search provides an important tool for users to consume the ever-growing digital media repositories, and is becoming an increasingly demanding task as never before. In this thesis, we focus on improving the scalability, efficiency and usability of content-based image and video retrieval systems, particularly in dynamic and open environments. Towards our goal, we make four contributions to the research community. First, we propose a scalable approach to adopt bag-of-visual-words (BoVW) to content-based image retrieval (CBIR) in peer-to-peer (P2P) networks. To overcome the dynamic P2P environment, we propose a distributed codebook updating algorithm based on splitting/merging of individual codewords, which maintains the workload balance in the network churn. Our approach offers a scalable framework for content-based visual search in P2P environment. Second, we improve the retrieval performance of CBIR with relevance feedback (RF). We formulate the RF process as an energy minimization problem, and utilize graph cuts algorithm to solve the problem and obtain relevant/irrelevant labels for the images. Our method enables flexible partitioning of the feature space and is capable of handling challenging scenarios. Third, we improve the retrieval performance of trajectory based action video retrieval with spatial-temporal context. We exploit the spatial-temporal correlations among trajectories for descriptor coding, and tackle the trajectory segment mis-alignment issue with an offset-aware distance for trajectory matching. Finally, we develop a toolset to improve the efficiency and provide better insight of the BoVW pipeline. Our toolset provides robust integration of different methods, automatic parallel execution and result reusing, and visualization of the retrieval process.

APA, Harvard, Vancouver, ISO, and other styles

7

Aytar, Yusuf. "SEMANTIC VIDEO RETRIEVAL USING HIGH LEVEL CONTEXT." Master's thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3455.

Full text

Abstract:

Video retrieval – searching and retrieving videos relevant to a user defined query – is one of the most popular topics in both real life applications and multimedia research. This thesis employs concepts from Natural Language Understanding in solving the video retrieval problem. Our main contribution is the utilization of the semantic word similarity measures for video retrieval through the trained concept detectors, and the visual co-occurrence relations between such concepts. We propose two methods for content-based retrieval of videos: (1) A method for retrieving a new concept (a concept which is not known to the system and no annotation is available) using semantic word similarity and visual co-occurrence, which is an unsupervised method. (2) A method for retrieval of videos based on their relevance to a user defined text query using the semantic word similarity and visual content of videos. For evaluation purposes, we mainly used the automatic search and the high level feature extraction test set of TRECVID'06 and TRECVID'07 benchmarks. These two data sets consist of 250 hours of multilingual news video captured from American, Arabic, German and Chinese TV channels. Although our method for retrieving a new concept is an unsupervised method, it outperforms the trained concept detectors (which are supervised) on 7 out of 20 test concepts, and overall it performs very close to the trained detectors. On the other hand, our visual content based semantic retrieval method performs more than 100% better than the text-based retrieval method. This shows that using visual content alone we can have significantly good retrieval results.
M.S.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science MS

APA, Harvard, Vancouver, ISO, and other styles

8

Volkmer, Timo, and timovolkmer@gmx net. "Semantics of Video Shots for Content-based Retrieval." RMIT University. Computer Science and Information Technology, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20090220.122213.

Full text

Abstract:

Content-based video retrieval research combines expertise from many different areas, such as signal processing, machine learning, pattern recognition, and computer vision. As video extends into both the spatial and the temporal domain, we require techniques for the temporal decomposition of footage so that specific content can be accessed. This content may then be semantically classified - ideally in an automated process - to enable filtering, browsing, and searching. An important aspect that must be considered is that pictorial representation of information may be interpreted differently by individual users because it is less specific than its textual representation. In this thesis, we address several fundamental issues of content-based video retrieval for effective handling of digital footage. Temporal segmentation, the common first step in handling digital video, is the decomposition of video streams into smaller, semantically coherent entities. This is usually performed by detecting the transitions that separate single camera takes. While abrupt transitions - cuts - can be detected relatively well with existing techniques, effective detection of gradual transitions remains difficult. We present our approach to temporal video segmentation, proposing a novel algorithm that evaluates sets of frames using a relatively simple histogram feature. Our technique has been shown to range among the best existing shot segmentation algorithms in large-scale evaluations. The next step is semantic classification of each video segment to generate an index for content-based retrieval in video databases. Machine learning techniques can be applied effectively to classify video content. However, these techniques require manually classified examples for training before automatic classification of unseen content can be carried out. Manually classifying training examples is not trivial because of the implied ambiguity of visual content. We propose an unsupervised learning approach based on latent class modelling in which we obtain multiple judgements per video shot and model the users' response behaviour over a large collection of shots. This technique yields a more generic classification of the visual content. Moreover, it enables the quality assessment of the classification, and maximises the number of training examples by resolving disagreement. We apply this approach to data from a large-scale, collaborative annotation effort and present ways to improve the effectiveness for manual annotation of visual content by better design and specification of the process. Automatic speech recognition techniques along with semantic classification of video content can be used to implement video search using textual queries. This requires the application of text search techniques to video and the combination of different information sources. We explore several text-based query expansion techniques for speech-based video retrieval, and propose a fusion method to improve overall effectiveness. To combine both text and visual search approaches, we explore a fusion technique that combines spoken information and visual information using semantic keywords automatically assigned to the footage based on the visual content. The techniques that we propose help to facilitate effective content-based video retrieval and highlight the importance of considering different user interpretations of visual content. This allows better understanding of video content and a more holistic approach to multimedia retrieval in the future.

APA, Harvard, Vancouver, ISO, and other styles

9

Fernández, Beltrán Rubén. "Characterisation and adaptive learning in interactive video retrieval." Doctoral thesis, Universitat Jaume I, 2016. http://hdl.handle.net/10803/387220.

Full text

Abstract:

El objetivo principal de esta tesis consiste en utilizar eficazmente los modelos de tópicos latentes para afrontar el problema de la recuperación automática de vídeo. Concretamente, se pretende mejorar tanto a nivel de eficiencia como a nivel de precisión el actual estado del arte en materia de los sitemas de recuperación automática de vídeo. En general, los modelos de tópicos latentes son un conjunto de herramientas estadísticas que permiten extraer los patrones generadores de una colección de datos. Tradicionalmente, este tipo de técnicas no han sido consideradas de gran utilidad para los sistemas de recuperación automática de vídeo debido a su alto coste computacional y a la propia complejidad del espacio de tópicos en el ámbito de la información visual.
In this work, we are interested in the use of latent topics to overcome the current limitations in CBVR. Despite the potential of topic models to uncover the hidden structure of a collection, they have traditionally been unable to provide a competitive advantage in CBVR because of the high computational cost of their algorithms and the complexity of the latent space in the visual domain. Throughout this thesis we focus on designing new models and tools based on topic models to take advantage of the latent space in CBVR. Specifically, we have worked in four different areas within the retrieval process: vocabulary reduction, encoding, modelling and ranking, being our most important contributions related to both modelling and ranking.

APA, Harvard, Vancouver, ISO, and other styles

10

Demirdizen, Goncagul. "An Ontology-driven Video Annotation And Retrieval System." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12612592/index.pdf.

Full text

Abstract:

In this thesis, a system, called Ontology-Driven Video Annotation and Retrieval System (OntoVARS) is developed in order to provide a video management system which is used for ontology-driven semantic content annotation and querying. The proposed system is based on MPEG-7 ontology which provides interoperability and common communication platform with other MPEG-7 ontology compatible systems. The Rhizomik MPEG-7 ontology is used as the core ontology and domain specific ontologies are integrated to the core ontology in order to provide ontology-based video content annotation and querying capabilities to the user. The proposed system supports content-based annotation and spatio-temporal data modeling in video databases by using the domain ontology concepts. Moreover, the system enables ontology-driven query formulation and processing according to the domain ontology instances and concepts. In the developed system, ontology-driven concept querying, spatio-temporal querying, region-based and time-based querying capabilities are performed as simple querying types. Besides these simple query types, compound queries are also generated by combining simple queries with "
("
, "
)"
, "
AND"
and "
OR"
operators. For all these query types, the system supports both general and video specific query processing. By this means, the user is able to pose queries on all videos in the video databases as well as the details of a specific video of interest.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Video retrieval"

1

Scott, Stevens, and Little Thomas, eds. Video information retrieval. New York: Association for Computing Machinery, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

2

Petković, Milan, and Willem Jonker. Content-Based Video Retrieval. Boston, MA: Springer US, 2004. http://dx.doi.org/10.1007/978-1-4757-4865-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Sundaram, Hari, Milind Naphade, John R. Smith, and Yong Rui, eds. Image and Video Retrieval. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11788034.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Lew, Michael S., Nicu Sebe, and John P. Eakins, eds. Image and Video Retrieval. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002. http://dx.doi.org/10.1007/3-540-45479-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Leow, Wee-Kheng, Michael S. Lew, Tat-Seng Chua, Wei-Ying Ma, Lekha Chaisorn, and Erwin M. Bakker, eds. Image and Video Retrieval. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11526346.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Bakker, Erwin M., Michael S. Lew, Thomas S. Huang, Nicu Sebe, and Xiang Sean Zhou, eds. Image and Video Retrieval. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003. http://dx.doi.org/10.1007/3-540-45113-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Enser, Peter, Yiannis Kompatsiaris, Noel E. O’Connor, Alan F. Smeaton, and Arnold W. M. Smeulders, eds. Image and Video Retrieval. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/b98923.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Marques, Oge, and Borko Furht. Content-Based Image and Video Retrieval. Boston, MA: Springer US, 2002. http://dx.doi.org/10.1007/978-1-4615-0987-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Marques, Oge. Content-based image and video retrieval. Boston: Kluwer Academic Publishers, 2002.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

10

1962-, Jonker Willem, ed. Content-based video retrieval: A database perspective. Boston: Kluwer Academic Publishers, 2004.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Video retrieval"

1

Snoek, Cees G. M., and Arnold W. M. Smeulders. "Video Retrieval." In Computer Vision, 847–50. Boston, MA: Springer US, 2014. http://dx.doi.org/10.1007/978-0-387-31439-6_74.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Snoek, Cees G. M., and Arnold W. M. Smeulders. "Video Retrieval." In Computer Vision, 1328–31. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-63416-2_74.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Snoek, Cees G. M., Marcel Worring, Jan-Mark Geusebroek, Dennis C. Koelma, Frank J. Seinstra, and Arnold W. M. Smeulders. "Semantic Video Indexing." In Multimedia Retrieval, 225–49. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. http://dx.doi.org/10.1007/978-3-540-72895-5_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Lucas, Laurent, Céline Loscos, and Yannick Remion. "3D Model Retrieval." In 3D Video, 347–68. Hoboken, USA: John Wiley & Sons, Inc., 2013. http://dx.doi.org/10.1002/9781118761915.ch18.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Petković, Milan, and Willem Jonker. "Video Modeling." In Content-Based Video Retrieval, 33–53. Boston, MA: Springer US, 2004. http://dx.doi.org/10.1007/978-1-4757-4865-9_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Mihajlović, Vojkan, Milan Petković, Willem Jonker, and Henk Blanken. "Multimodal Content-based Video Retrieval." In Multimedia Retrieval, 271–94. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. http://dx.doi.org/10.1007/978-3-540-72895-5_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Yan, Rong, Alexander G. Hauptmann, and Rong Jin. "Pseudo-Relevance Feedback for Multimedia Retrieval." In Video Mining, 309–38. Boston, MA: Springer US, 2003. http://dx.doi.org/10.1007/978-1-4757-6928-9_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Putheti, Sudhakar, M. N. Sri Harsha, and A. Vishnuvardhan. "Motion Detection in Video Retrieval Using Content-Based Video Retrieval." In Innovations in Computer Science and Engineering, 235–42. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-7082-3_28.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Bolle, Ruud M., Boon-Lock Yeo, and Minerva M. Yeung. "Video query and retrieval." In Advanced Topics in Artificial Intelligence, 13–24. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997. http://dx.doi.org/10.1007/3-540-63797-4_54.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Chmelar, Petr, Ivana Rudolfova, and Jaroslav Zendulka. "Clustering for Video Retrieval." In Data Warehousing and Knowledge Discovery, 390–401. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-03730-6_31.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Video retrieval"

1

Zhao, Hongrui, Jin Yu, Yanan Li, Donghui Wang, Jie Liu, Hongxia Yang, and Fei Wu. "Dress like an Internet Celebrity: Fashion Retrieval in Videos." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/147.

Full text

Abstract:

Nowadays, both online shopping and video sharing have grown exponentially. Although internet celebrities in videos are ideal exhibition for fashion corporations to sell their products, audiences do not always know where to buy fashion products in videos, which is a cross-domain problem called video-to-shop. In this paper, we propose a novel deep neural network, called Detect, Pick, and Retrieval Network (DPRNet), to break the gap between fashion products from videos and audiences. For the video side, we have modified the traditional object detector, which automatically picks out the best object proposals for every commodity in videos without duplication, to promote the performance of the video-to-shop task. For the fashion retrieval side, a simple but effective multi-task loss network obtains new state-of-the-art results on DeepFashion. Extensive experiments conducted on a new large-scale cross-domain video-to-shop dataset shows that DPRNet is efficient and outperforms the state-of-the-art methods on video-to-shop task.

APA, Harvard, Vancouver, ISO, and other styles

2

Wang, Wenzhe, Mengdan Zhang, Runnan Chen, Guanyu Cai, Penghao Zhou, Pai Peng, Xiaowei Guo, Jian Wu, and Xing Sun. "Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/154.

Full text

Abstract:

Multi-modal cues presented in videos are usually beneficial for the challenging video-text retrieval task on internet-scale datasets. Recent video retrieval methods take advantage of multi-modal cues by aggregating them to holistic high-level semantics for matching with text representations in a global view. In contrast to this global alignment, the local alignment of detailed semantics encoded within both multi-modal cues and distinct phrases is still not well conducted. Thus, in this paper, we leverage the hierarchical video-text alignment to fully explore the detailed diverse characteristics in multi-modal cues for fine-grained alignment with local semantics from phrases, as well as to capture a high-level semantic correspondence. Specifically, multi-step attention is learned for progressively comprehensive local alignment and a holistic transformer is utilized to summarize multi-modal cues for global alignment. With hierarchical alignment, our model outperforms state-of-the-art methods on three public video retrieval datasets.

APA, Harvard, Vancouver, ISO, and other styles

3

Hauptmann, Alexander G., Wei-Hao Lin, Rong Yan, Jun Yang, and Ming-Yu Chen. "Extreme video retrieval." In the 14th annual ACM international conference. New York, New York, USA: ACM Press, 2006. http://dx.doi.org/10.1145/1180639.1180721.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Feng, Zerun, Zhimin Zeng, Caili Guo, and Zheng Li. "Exploiting Visual Semantic Reasoning for Video-Text Retrieval." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/140.

Full text

Abstract:

Video retrieval is a challenging research topic bridging the vision and language areas and has attracted broad attention in recent years. Previous works have been devoted to representing videos by directly encoding from frame-level features. In fact, videos consist of various and abundant semantic relations to which existing methods pay less attention. To address this issue, we propose a Visual Semantic Enhanced Reasoning Network (ViSERN) to exploit reasoning between frame regions. Speciﬁcally, we consider frame regions as vertices and construct a fully-connected semantic correlation graph. Then, we perform reasoning by novel random walk rule-based graph convolutional networks to generate region features involved with semantic relations. With the beneﬁt of reasoning, semantic interactions between regions are considered, while the impact of redundancy is suppressed. Finally, the region features are aggregated to form frame-level features for further encoding to measure video-text similarity. Extensive experiments on two public benchmark datasets validate the effectiveness of our method by achieving state-of-the-art performance due to the powerful semantic reasoning.

APA, Harvard, Vancouver, ISO, and other styles

5

Mi Hee, Yoon Yong Ik, and Kio Chung Kim. "Unified video retrieval system supporting similarity retrieval." In Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99. IEEE, 1999. http://dx.doi.org/10.1109/dexa.1999.795298.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Alshuth, P., Th Hermes, J. Kreyß, and M. Röper. "Video retrieval with IRIS." In the fourth ACM international conference. New York, New York, USA: ACM Press, 1996. http://dx.doi.org/10.1145/244130.244451.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Yang, Linjun, Yang Cai, Alan Hanjalic, Xian-Sheng Hua, and Shipeng Li. "Video-based image retrieval." In the 19th ACM international conference. New York, New York, USA: ACM Press, 2011. http://dx.doi.org/10.1145/2072298.2071923.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Dong, Jianfeng, Xianke Chen, Minsong Zhang, Xun Yang, Shujie Chen, Xirong Li, and Xun Wang. "Partially Relevant Video Retrieval." In MM '22: The 30th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503161.3547976.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Mu, Xiangming. "Content-based video retrieval." In the 29th annual international ACM SIGIR conference. New York, New York, USA: ACM Press, 2006. http://dx.doi.org/10.1145/1148170.1148314.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Gao, Junyu, and Changsheng Xu. "Fast Video Moment Retrieval." In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021. http://dx.doi.org/10.1109/iccv48922.2021.00155.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Video retrieval"

1

Kobla, Vikrant, David Doermann, King-Ip Lin, and Christos Faloutsos. Feature Normalization for Video Indexing and Retrieval. Fort Belvoir, VA: Defense Technical Information Center, November 1996. http://dx.doi.org/10.21236/ada459805.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Liang, Yiqing. Video Retrieval Based on Language and Image Analysis. Fort Belvoir, VA: Defense Technical Information Center, May 1999. http://dx.doi.org/10.21236/ada364129.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Ben-Arie, Jezekiel, A. P. Sistla, and Clement Yu. Retrieval from Video and Pictorial Databases Employing Similarity and Motion. Fort Belvoir, VA: Defense Technical Information Center, November 1999. http://dx.doi.org/10.21236/ada398364.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Decleir, Cyril, Mohand-Saïd Hacid, and Jacques Kouloumdjian. A Database Approach for Modeling and Querying Video Data. Aachen University of Technology, 1999. http://dx.doi.org/10.25368/2022.90.

Full text

Abstract:

Indexing video data is essential for providing content based access. In this paper, we consider how database technology can offer an integrated framework for modeling and querying video data. As many concerns in video (e.g., modeling and querying) are also found in databases, databases provide an interesting angle to attack many of the problems. From a video applications perspective, database systems provide a nice basis for future video systems. More generally, database research will provide solutions to many video issues even if these are partial or fragmented. From a database perspective, video applications provide beautiful challenges. Next generation database systems will need to provide support for multimedia data (e.g., image, video, audio). These data types require new techniques for their management (i.e., storing, modeling, querying, etc.). Hence new solutions are significant. This paper develops a data model and a rule-based query language for video content based indexing and retrieval. The data model is designed around the object and constraint paradigms. A video sequence is split into a set of fragments. Each fragment can be analyzed to extract the information (symbolic descriptions) of interest that can be put into a database. This database can then be searched to find information of interest. Two types of information are considered: (1) the entities (objects) of interest in the domain of a video sequence, (2) video frames which contain these entities. To represent these information, our data model allows facts as well as objects and constraints. We present a declarative, rule-based, constraint query language that can be used to infer relationships about information represented in the model. The language has a clear declarative and operational semantics. This work is a major revision and a consolidation of [12, 13].

APA, Harvard, Vancouver, ISO, and other styles

5

Rigotti, Christophe, and Mohand-Saïd Hacid. Representing and Reasoning on Conceptual Queries Over Image Databases. Aachen University of Technology, 1999. http://dx.doi.org/10.25368/2022.89.

Full text

Abstract:

The problem of content management of multimedia data types (e.g., image, video, graphics) is becoming increasingly important with the development of advanced multimedia applications. Traditional database management systems are inadequate for the handling of such data types. They require new techniques for query formulation, retrieval, evaluation, and navigation. In this paper we develop a knowledge-based framework for modeling and retrieving image data by content. To represent the various aspects of an image object's characteristics, we propose a model which consists of three layers: (1) Feature and Content Layer, intended to contain image visual features such as contours, shapes,etc.; (2) Object Layer, which provides the (conceptual) content dimension of images; and (3) Schema Layer, which contains the structured abstractions of images, i.e., a general schema about the classes of objects represented in the object layer. We propose two abstract languages on the basis of description logics: one for describing knowledge of the object and schema layers, and the other, more expressive, for making queries. Queries can refer to the form dimension (i.e., information of the Feature and Content Layer) or to the content dimension (i.e., information of the Object Layer). These languages employ a variable free notation, and they are well suited for the design, verification and complexity analysis of algorithms. As the amount of information contained in the previous layers may be huge and operations performed at the Feature and Content Layer are time-consuming, resorting to the use of materialized views to process and optimize queries may be extremely useful. For that, we propose a formal framework for testing containment of a query in a view expressed in our query language. The algorithm we propose is sound and complete and relatively efficient.

APA, Harvard, Vancouver, ISO, and other styles

6

Rigotti, Christophe, and Mohand-Saïd Hacid. Representing and Reasoning on Conceptual Queries Over Image Databases. Aachen University of Technology, 1999. http://dx.doi.org/10.25368/2022.89.

Full text

Abstract:

The problem of content management of multimedia data types (e.g., image, video, graphics) is becoming increasingly important with the development of advanced multimedia applications. Traditional database management systems are inadequate for the handling of such data types. They require new techniques for query formulation, retrieval, evaluation, and navigation. In this paper we develop a knowledge-based framework for modeling and retrieving image data by content. To represent the various aspects of an image object's characteristics, we propose a model which consists of three layers: (1) Feature and Content Layer, intended to contain image visual features such as contours, shapes,etc.; (2) Object Layer, which provides the (conceptual) content dimension of images; and (3) Schema Layer, which contains the structured abstractions of images, i.e., a general schema about the classes of objects represented in the object layer. We propose two abstract languages on the basis of description logics: one for describing knowledge of the object and schema layers, and the other, more expressive, for making queries. Queries can refer to the form dimension (i.e., information of the Feature and Content Layer) or to the content dimension (i.e., information of the Object Layer). These languages employ a variable free notation, and they are well suited for the design, verification and complexity analysis of algorithms. As the amount of information contained in the previous layers may be huge and operations performed at the Feature and Content Layer are time-consuming, resorting to the use of materialized views to process and optimize queries may be extremely useful. For that, we propose a formal framework for testing containment of a query in a view expressed in our query language. The algorithm we propose is sound and complete and relatively efficient.

APA, Harvard, Vancouver, ISO, and other styles

7

Tavakoli, Arash, Vahid Balali, and Arsalan Heydarian. How do Environmental Factors Affect Drivers’ Gaze and Head Movements? Mineta Transportation Institute, August 2021. http://dx.doi.org/10.31979/mti.2021.2044.

Full text

Abstract:

Studies have shown that environmental factors affect driving behaviors. For instance, weather conditions and the presence of a passenger have been shown to significantly affect the speed of the driver. As one of the important measures of driving behavior is the gaze and head movements of the driver, such metrics can be potentially used towards understanding the effects of environmental factors on the driver’s behavior in real-time. In this study, using a naturalistic study platform, videos have been collected from six participants for more than four weeks of a fully naturalistic driving scenario. The videos of both the participants’ faces and roads have been cleaned and manually categorized depending on weather, road type, and passenger conditions. Facial videos have been analyzed using OpenFace to retrieve the gaze direction and head movements of the driver. Results, overall, suggest that the gaze direction and head movements of the driver are affected by a combination of environmental factors and individual differences. Specifically, results depict the distracting effect of the passenger on some individuals. In addition, it shows that highways and city streets are the cause for maximum distraction on the driver’s gaze.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!