Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Video frame.

Artykuły w czasopismach na temat „Video frame”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Video frame”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Liu, Dianting, Mei-Ling Shyu, Chao Chen i Shu-Ching Chen. "Within and Between Shot Information Utilisation in Video Key Frame Extraction". Journal of Information & Knowledge Management 10, nr 03 (wrzesień 2011): 247–59. http://dx.doi.org/10.1142/s0219649211002961.

Pełny tekst źródła
Streszczenie:
In consequence of the popularity of family video recorders and the surge of Web 2.0, increasing amounts of videos have made the management and integration of the information in videos an urgent and important issue in video retrieval. Key frames, as a high-quality summary of videos, play an important role in the areas of video browsing, searching, categorisation, and indexing. An effective set of key frames should include major objects and events of the video sequence, and should contain minimum content redundancies. In this paper, an innovative key frame extraction method is proposed to select representative key frames for a video. By analysing the differences between frames and utilising the clustering technique, a set of key frame candidates (KFCs) is first selected at the shot level, and then the information within a video shot and between video shots is used to filter the candidate set to generate the final set of key frames. Experimental results on the TRECVID 2007 video dataset have demonstrated the effectiveness of our proposed key frame extraction method in terms of the percentage of the extracted key frames and the retrieval precision.
Style APA, Harvard, Vancouver, ISO itp.
2

Gong, Tao, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu i Huamin Feng. "Temporal ROI Align for Video Object Recognition". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 2 (18.05.2021): 1442–50. http://dx.doi.org/10.1609/aaai.v35i2.16234.

Pełny tekst źródła
Streszczenie:
Video object detection is challenging in the presence of appearance deterioration in certain video frames. Therefore, it is a natural choice to aggregate temporal information from other frames of the same video into the current frame. However, ROI Align, as one of the most core procedures of video detectors, still remains extracting features from a single-frame feature map for proposals, making the extracted ROI features lack temporal information from videos. In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal ROI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity. The proposed Temporal ROI Align operator can extract temporal information from the entire video for proposals. We integrate it into single-frame video detectors and other state-of-the-art video detectors, and conduct quantitative experiments to demonstrate that the proposed Temporal ROI Align operator can consistently and significantly boost the performance. Besides, the proposed Temporal ROI Align can also be applied into video instance segmentation.
Style APA, Harvard, Vancouver, ISO itp.
3

Alsrehin, Nawaf O., i Ahmad F. Klaib. "VMQ: an algorithm for measuring the Video Motion Quality". Bulletin of Electrical Engineering and Informatics 8, nr 1 (1.03.2019): 231–38. http://dx.doi.org/10.11591/eei.v8i1.1418.

Pełny tekst źródła
Streszczenie:
This paper proposes a new full-reference algorithm, called Video Motion Quality (VMQ) that evaluates the relative motion quality of the distorted video generated from the reference video based on all the frames from both videos. VMQ uses any frame-based metric to compare frames from the original and distorted videos. It uses the time stamp for each frame to measure the intersection values. VMQ combines the comparison values with the intersection values in an aggregation function to produce the final result. To explore the efficiency of the VMQ, we used a set of raw, uncompressed videos to generate a new set of encoded videos. These encoded videos are then used to generate a new set of distorted videos which have the same video bit rate and frame size but with reduced frame rate. To evaluate the VMQ, we applied the VMQ by comparing the encoded videos with the distorted videos and recorded the results. The initial evaluation results showed compatible trends with most of subjective evaluation results.
Style APA, Harvard, Vancouver, ISO itp.
4

Park, Sunghyun, Kangyeol Kim, Junsoo Lee, Jaegul Choo, Joonseok Lee, Sookyung Kim i Edward Choi. "Vid-ODE: Continuous-Time Video Generation with Neural Ordinary Differential Equation". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 3 (18.05.2021): 2412–22. http://dx.doi.org/10.1609/aaai.v35i3.16342.

Pełny tekst źródła
Streszczenie:
Video generation models often operate under the assumption of fixed frame rates, which leads to suboptimal performance when it comes to handling flexible frame rates (e.g., increasing the frame rate of the more dynamic portion of the video as well as handling missing video frames). To resolve the restricted nature of existing video generation models' ability to handle arbitrary timesteps, we propose continuous-time video generation by combining neural ODE (Vid-ODE) with pixel-level video processing techniques. Using ODE-ConvGRU as an encoder, a convolutional version of the recently proposed neural ODE, which enables us to learn continuous-time dynamics, Vid-ODE can learn the spatio-temporal dynamics of input videos of flexible frame rates. The decoder integrates the learned dynamics function to synthesize video frames at any given timesteps, where the pixel-level composition technique is used to maintain the sharpness of individual frames. With extensive experiments on four real-world video datasets, we verify that the proposed Vid-ODE outperforms state-of-the-art approaches under various video generation settings, both within the trained time range (interpolation) and beyond the range (extrapolation). To the best of our knowledge, Vid-ODE is the first work successfully performing continuous-time video generation using real-world videos.
Style APA, Harvard, Vancouver, ISO itp.
5

Chang, Yuchou, i Hong Lin. "Irrelevant frame removal for scene analysis using video hyperclique pattern and spectrum analysis". Journal of Advanced Computer Science & Technology 5, nr 1 (6.02.2016): 1. http://dx.doi.org/10.14419/jacst.v5i1.4035.

Pełny tekst źródła
Streszczenie:
<p>Video often include frames that are irrelevant to the scenes for recording. These are mainly due to imperfect shooting, abrupt movements of camera, or unintended switching of scenes. The irrelevant frames should be removed before the semantic analysis of video scene is performed for video retrieval. An unsupervised approach for automatic removal of irrelevant frames is proposed in this paper. A novel log-spectral representation of color video frames based on Fibonacci lattice-quantization has been developed for better description of the global structures of video contents to measure similarity of video frames. Hyperclique pattern analysis, used to detect redundant data in textual analysis, is extended to extract relevant frame clusters in color videos. A new strategy using the k-nearest neighbor algorithm is developed for generating a video frame support measure and an h-confidence measure on this hyperclique pattern based analysis method. Evaluation of the proposed irrelevant video frame removal algorithm reveals promising results for datasets with irrelevant frames.</p>
Style APA, Harvard, Vancouver, ISO itp.
6

Li, WenLin, DeYu Qi, ChangJian Zhang, Jing Guo i JiaJun Yao. "Video Summarization Based on Mutual Information and Entropy Sliding Window Method". Entropy 22, nr 11 (12.11.2020): 1285. http://dx.doi.org/10.3390/e22111285.

Pełny tekst źródła
Streszczenie:
This paper proposes a video summarization algorithm called the Mutual Information and Entropy based adaptive Sliding Window (MIESW) method, which is specifically for the static summary of gesture videos. Considering that gesture videos usually have uncertain transition postures and unclear movement boundaries or inexplicable frames, we propose a three-step method where the first step involves browsing a video, the second step applies the MIESW method to select candidate key frames, and the third step removes most redundant key frames. In detail, the first step is to convert the video into a sequence of frames and adjust the size of the frames. In the second step, a key frame extraction algorithm named MIESW is executed. The inter-frame mutual information value is used as a metric to adaptively adjust the size of the sliding window to group similar content of the video. Then, based on the entropy value of the frame and the average mutual information value of the frame group, the threshold method is applied to optimize the grouping, and the key frames are extracted. In the third step, speeded up robust features (SURF) analysis is performed to eliminate redundant frames in these candidate key frames. The calculation of Precision, Recall, and Fmeasure are optimized from the perspective of practicality and feasibility. Experiments demonstrate that key frames extracted using our method provide high-quality video summaries and basically cover the main content of the gesture video.
Style APA, Harvard, Vancouver, ISO itp.
7

Li, Xin, QiLin Li, Dawei Yin, Lijun Zhang i Dezhong Peng. "Unsupervised Video Summarization Based on An Encoder-Decoder Architecture". Journal of Physics: Conference Series 2258, nr 1 (1.04.2022): 012067. http://dx.doi.org/10.1088/1742-6596/2258/1/012067.

Pełny tekst źródła
Streszczenie:
Abstract The purpose of video summarization is to facilitate large-scale video browsing. Video summarization is a short and concise synopsis of original video. It usually composed of a set of representative video frames from the original video. This paper solves the problem of unsupervised video summarization by developing a Video Summarization Network (VSN) to summarize videos, which is formulated as selecting a sparse subset of video frames that best represents the input video. VSN predicts a probability for each video frame, which indicates the possibility of a frame being selected, and then takes actions to select frames according to the probability distribution to form a video summary. We designed a novel loss function which takes into account the diversity and representativeness of the generated summarization without labels or user interaction.
Style APA, Harvard, Vancouver, ISO itp.
8

Mahum, Rabbia, Aun Irtaza, Saeed Ur Rehman, Talha Meraj i Hafiz Tayyab Rauf. "A Player-Specific Framework for Cricket Highlights Generation Using Deep Convolutional Neural Networks". Electronics 12, nr 1 (24.12.2022): 65. http://dx.doi.org/10.3390/electronics12010065.

Pełny tekst źródła
Streszczenie:
Automatic ways to generate video summarization is a key technique to manage huge video content nowadays. The aim of video summaries is to provide important information in less time to viewers. There exist some techniques for video summarization in the cricket domain, however, to the best of our knowledge our proposed model is the first one to deal with specific player summaries in cricket videos successfully. In this study, we provide a novel framework and a valuable technique for cricket video summarization and classification. For video summary specific to the player, the proposed technique exploits the fact i.e., presence of Score Caption (SC) in frames. In the first stage, optical character recognition (OCR) is applied to extract text summary from SC to find all frames of the specific player such as the Start Frame (SF) to the Last Frame (LF). In the second stage, various frames of cricket videos are used in the supervised AlexNet classifier for training along with class labels such as positive and negative for binary classification. A pre-trained network is trained for binary classification of those frames which are attained from the first phase exhibiting the performance of a specific player along with some additional scenes. In the third phase, the person identification technique is employed to recognize frames containing the specific player. Then, frames are cropped and SIFT features are extracted from identified person to further cluster these frames using the fuzzy c-means clustering method. The reason behind the third phase is to further optimize the video summaries as the frames attained in the second stage included the partner player’s frame as well. The proposed framework successfully utilizes the cricket videoo dataset. Additionally, the technique is very efficient and useful in broadcasting cricket video highlights of a specific player. The experimental results signify that our proposed method surpasses the previously stated results, improving the overall accuracy of up to 95%.
Style APA, Harvard, Vancouver, ISO itp.
9

Wang, Yifan, Hao Wang, Kaijie Wang i Wei Zhang. "Cloud Gaming Video Coding Optimization Based on Camera Motion-Guided Reference Frame Enhancement". Applied Sciences 12, nr 17 (25.08.2022): 8504. http://dx.doi.org/10.3390/app12178504.

Pełny tekst źródła
Streszczenie:
Recent years have witnessed tremendous advances in clouding gaming. To alleviate the bandwidth pressure due to transmissions of high-quality cloud gaming videos, this paper optimized existing video codecs with deep learning networks to reduce the bitrate consumption of cloud gaming videos. Specifically, a camera motion-guided network, i.e., CMGNet, was proposed for the reference frame enhancement, leveraging the camera motion information of cloud gaming videos and the reconstructed frames in the reference frame list. The obtained high-quality reference frame was then added to the reference frame list to improve the compression efficiency. The decoder side performs the same operation to generate the reconstructed frames using the updated reference frame list. In the CMGNet, camera motions were used as guidance to estimate the frame motion and weight masks to achieve more accurate frame alignment and fusion, respectively. As a result, the quality of the reference frame was significantly enhanced and thus being more suitable as a prediction candidate for the target frame. Experimental results demonstrate the effectiveness of the proposed algorithm, which achieves 4.91% BD-rate reduction on average. Moreover, a cloud gaming video dataset with camera motion data was made available to promote research on game video compression.
Style APA, Harvard, Vancouver, ISO itp.
10

Kawin, Bruce. "Video Frame Enlargments". Film Quarterly 61, nr 3 (2008): 52–57. http://dx.doi.org/10.1525/fq.2008.61.3.52.

Pełny tekst źródła
Streszczenie:
Abstract This essay discusses frame-enlargment technology, comparing digital and photographic alternatives and concluding, after the analysis of specific examples, that frames photographed from a 35mm print are much superior in quality.
Style APA, Harvard, Vancouver, ISO itp.
11

Sun, Fan, i Xuedong Tian. "Lecture Video Automatic Summarization System Based on DBNet and Kalman Filtering". Mathematical Problems in Engineering 2022 (31.08.2022): 1–10. http://dx.doi.org/10.1155/2022/5303503.

Pełny tekst źródła
Streszczenie:
Video summarization for educational scenarios aims to extract and locate the most meaningful frames from the original video based on the main contents of the lecture video. Aiming at the defect of existing computer vision-based lecture video summarization methods that tend to target specific scenes, a summarization method based on content detection and tracking is proposed. Firstly, DBNet is introduced to detect the contents such as text and mathematical formulas in the static frames of these videos, which is combined with the convolutional block attention module (CBAM) to improve the detection precision. Then, frame-by-frame data association of content instances is performed using Kalman filtering, the Hungarian algorithm, and appearance feature vectors to build a tracker. Finally, video segmentation and key frame location extraction are performed according to the content instance lifelines and content deletion events constructed by the tracker, and the extracted key frame groups are used as the final video summary result. Experimenting on a variety of scenarios of lecture video, the average precision of content detection is 89.1%; the average recall of summary results is 92.1%.
Style APA, Harvard, Vancouver, ISO itp.
12

He, Fei, Naiyu Gao, Qiaozhe Li, Senyao Du, Xin Zhao i Kaiqi Huang. "Temporal Context Enhanced Feature Aggregation for Video Object Detection". Proceedings of the AAAI Conference on Artificial Intelligence 34, nr 07 (3.04.2020): 10941–48. http://dx.doi.org/10.1609/aaai.v34i07.6727.

Pełny tekst źródła
Streszczenie:
Video object detection is a challenging task because of the presence of appearance deterioration in certain video frames. One typical solution is to aggregate neighboring features to enhance per-frame appearance features. However, such a method ignores the temporal relations between the aggregated frames, which is critical for improving video recognition accuracy. To handle the appearance deterioration problem, this paper proposes a temporal context enhanced network (TCENet) to exploit temporal context information by temporal aggregation for video object detection. To handle the displacement of the objects in videos, a novel DeformAlign module is proposed to align the spatial features from frame to frame. Instead of adopting a fixed-length window fusion strategy, a temporal stride predictor is proposed to adaptively select video frames for aggregation, which facilitates exploiting variable temporal information and requiring fewer video frames for aggregation to achieve better results. Our TCENet achieves state-of-the-art performance on the ImageNet VID dataset and has a faster runtime. Without bells-and-whistles, our TCENet achieves 80.3% mAP by only aggregating 3 frames.
Style APA, Harvard, Vancouver, ISO itp.
13

SURAJ, M. G., D. S. GURU i S. MANJUNATH. "RECOGNITION OF POSTAL CODES FROM FINGERSPELLING VIDEO SEQUENCE". International Journal of Image and Graphics 11, nr 01 (styczeń 2011): 21–41. http://dx.doi.org/10.1142/s021946781100397x.

Pełny tekst źródła
Streszczenie:
In this paper, we present a methodology for recognizing fingerspelling signs in videos. A novel approach of user specific appearance model is proposed for improved recognition performance over the classical appearance based model. Fingerspelt postal index number (PIN) code signs in a video are recognized by identifying the signs in the individual frames of a video. Decomposition of a video into frames results in a large number of frames even for a video of short duration. Each frame is processed to get only the hand frame. This results in a series of hand frames corresponding to each video frame. The obtained series of hand frames consists of frames that are very similar to the neighboring frames and hence are redundant. Therefore, only a few hand frames can be selected and used for recognizing the PIN code signed by a signer. We propose two methods for the selection of images from a video sequence. The methods give a basis for selection of images and have been thoroughly tested to show their benefit in terms of appropriate selection of images and reduction in number of images when compared to arbitrary selection of images.
Style APA, Harvard, Vancouver, ISO itp.
14

Kim, Jeongmin, i Yong Ju Jung. "Multi-Stage Network for Event-Based Video Deblurring with Residual Hint Attention". Sensors 23, nr 6 (7.03.2023): 2880. http://dx.doi.org/10.3390/s23062880.

Pełny tekst źródła
Streszczenie:
Video deblurring aims at removing the motion blur caused by the movement of objects or camera shake. Traditional video deblurring methods have mainly focused on frame-based deblurring, which takes only blurry frames as the input to produce sharp frames. However, frame-based deblurring has shown poor picture quality in challenging cases of video restoration where severely blurred frames are provided as the input. To overcome this issue, recent studies have begun to explore the event-based approach, which uses the event sequence captured by an event camera for motion deblurring. Event cameras have several advantages compared to conventional frame cameras. Among these advantages, event cameras have a low latency in imaging data acquisition (0.001 ms for event cameras vs. 10 ms for frame cameras). Hence, event data can be acquired at a high acquisition rate (up to one microsecond). This means that the event sequence contains more accurate motion information than video frames. Additionally, event data can be acquired with less motion blur. Due to these advantages, the use of event data is highly beneficial for achieving improvements in the quality of deblurred frames. Accordingly, the results of event-based video deblurring are superior to those of frame-based deblurring methods, even for severely blurred video frames. However, the direct use of event data can often generate visual artifacts in the final output frame (e.g., image noise and incorrect textures), because event data intrinsically contain insufficient textures and event noise. To tackle this issue in event-based deblurring, we propose a two-stage coarse-refinement network by adding a frame-based refinement stage that utilizes all the available frames with more abundant textures to further improve the picture quality of the first-stage coarse output. Specifically, a coarse intermediate frame is estimated by performing event-based video deblurring in the first-stage network. A residual hint attention (RHA) module is also proposed to extract useful attention information from the coarse output and all the available frames. This module connects the first and second stages and effectively guides the frame-based refinement of the coarse output. The final deblurred frame is then obtained by refining the coarse output using the residual hint attention and all the available frame information in the second-stage network. We validated the deblurring performance of the proposed network on the GoPro synthetic dataset (33 videos and 4702 frames) and the HQF real dataset (11 videos and 2212 frames). Compared to the state-of-the-art method (D2Net), we achieved a performance improvement of 1 dB in PSNR and 0.05 in SSIM on the GoPro dataset, and an improvement of 1.7 dB in PSNR and 0.03 in SSIM on the HQF dataset.
Style APA, Harvard, Vancouver, ISO itp.
15

Li, Xinjie, i Huijuan Xu. "MEID: Mixture-of-Experts with Internal Distillation for Long-Tailed Video Recognition". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 2 (26.06.2023): 1451–59. http://dx.doi.org/10.1609/aaai.v37i2.25230.

Pełny tekst źródła
Streszczenie:
The long-tailed video recognition problem is especially challenging, as videos tend to be long and untrimmed, and each video may contain multiple classes, causing frame-level class imbalance. The previous method tackles the long-tailed video recognition only through frame-level sampling for class re-balance without distinguishing the frame-level feature representation between head and tail classes. To improve the frame-level feature representation of tail classes, we modulate the frame-level features with an auxiliary distillation loss to reduce the distribution distance between head and tail classes. Moreover, we design a mixture-of-experts framework with two different expert designs, i.e., the first expert with an attention-based classification network handling the original long-tailed distribution, and the second expert dealing with the re-balanced distribution from class-balanced sampling. Notably, in the second expert, we specifically focus on the frames unsolved by the first expert through designing a complementary frame selection module, which inherits the attention weights from the first expert and selects frames with low attention weights, and we also enhance the motion feature representation for these selected frames. To highlight the multi-label challenge in long-tailed video recognition, we create two additional benchmarks based on Charades and CharadesEgo videos with the multi-label property, called CharadesLT and CharadesEgoLT. Extensive experiments are conducted on the existing long-tailed video benchmark VideoLT and the two new benchmarks to verify the effectiveness of our proposed method with state-of-the-art performance. The code and proposed benchmarks are released at https://github.com/VisionLanguageLab/MEID.
Style APA, Harvard, Vancouver, ISO itp.
16

Al Bdour, Nashat. "Encryption of Dynamic Areas of Images in Video based on Certain Geometric and Color Shapes". WSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS 20 (29.03.2023): 109–18. http://dx.doi.org/10.37394/23209.2023.20.13.

Pełny tekst źródła
Streszczenie:
The paper is devoted to the search for new approaches to encrypting selected objects in an image. Videos were analyzed, which were divided into frames, and in each video frame, the necessary objects were detected for further encryption. Images of objects with a designated geometric shape and color characteristics of pixels were considered. To select objects, a method was used based on the calculation of average values, the analysis of which made it possible to determine the convergence with the established image. Dividing the selected field into subregions with different shapes solves the problem of finding objects of the same type with different scales. In addition, the paper considers the detection of moving objects. The detection of moving objects is carried out based on determining the frame difference in pixel codes in the form of a rectangular shape. Cellular automata technology was used for encryption. The best results were shown by the transition rules of elementary cellular automata, such as: 90, 105, 150, and XOR function. The use of cellular automata technologies made it possible to use one key sequence to encrypt objects on all video frames of the video. Encryption results are different for the same objects located in different places of the same video frame and different video frames of the video sequence. The video frame image is divided into bit layers, the number of which is determined by the length of the code of each pixel. Each bit layer is encrypted with the same evolution, which is formed by one initial key bit sequence. For each video frame, a different part of the evolution is used, as well as for each detected object in the image. This approach gives different results for any objects that have a different location both on the video frame image and in different video frames. The described methods allow you to automate the process of detecting objects on video and encrypting them.
Style APA, Harvard, Vancouver, ISO itp.
17

Zhou, Yuanding, Baopu Li, Zhihui Wang i Haojie Li. "Integrating Temporal and Spatial Attention for Video Action Recognition". Security and Communication Networks 2022 (26.04.2022): 1–8. http://dx.doi.org/10.1155/2022/5094801.

Pełny tekst źródła
Streszczenie:
In recent years, deep convolutional neural networks (DCNN) have been widely used in the field of video action recognition. Attention mechanisms are also increasingly utilized in action recognition tasks. In this paper, we want to combine temporal and spatial attention for better video action recognition. Specifically, we learn a set of sparse attention by computing class response maps for finding the most informative region in a video frame. Each video frame is resampled with this information to form two new frames, one focusing on the most discriminative regions of the image and the other on the complementary regions of the image. After computing sparse attention all the newly generated video frames are rearranged in the order of the original video to form two new videos. These two videos are then fed into a CNN as new inputs to reinforce the learning of discriminative regions in the images (spatial attention). And the CNN we used is a network with a frame selection strategy that allows the network to focus on only some of the frames to complete the classification task (temporal attention). Finally, we combine the three video (original, discriminative, and complementary) classification results to get the final result together. Our experiments on the datasets UCF101 and HMDB51 show that our approach outperforms the best available methods.
Style APA, Harvard, Vancouver, ISO itp.
18

Zhou, Yuanding, Baopu Li, Zhihui Wang i Haojie Li. "Integrating Temporal and Spatial Attention for Video Action Recognition". Security and Communication Networks 2022 (26.04.2022): 1–8. http://dx.doi.org/10.1155/2022/5094801.

Pełny tekst źródła
Streszczenie:
In recent years, deep convolutional neural networks (DCNN) have been widely used in the field of video action recognition. Attention mechanisms are also increasingly utilized in action recognition tasks. In this paper, we want to combine temporal and spatial attention for better video action recognition. Specifically, we learn a set of sparse attention by computing class response maps for finding the most informative region in a video frame. Each video frame is resampled with this information to form two new frames, one focusing on the most discriminative regions of the image and the other on the complementary regions of the image. After computing sparse attention all the newly generated video frames are rearranged in the order of the original video to form two new videos. These two videos are then fed into a CNN as new inputs to reinforce the learning of discriminative regions in the images (spatial attention). And the CNN we used is a network with a frame selection strategy that allows the network to focus on only some of the frames to complete the classification task (temporal attention). Finally, we combine the three video (original, discriminative, and complementary) classification results to get the final result together. Our experiments on the datasets UCF101 and HMDB51 show that our approach outperforms the best available methods.
Style APA, Harvard, Vancouver, ISO itp.
19

Zhou, Yuanding, Baopu Li, Zhihui Wang i Haojie Li. "Integrating Temporal and Spatial Attention for Video Action Recognition". Security and Communication Networks 2022 (26.04.2022): 1–8. http://dx.doi.org/10.1155/2022/5094801.

Pełny tekst źródła
Streszczenie:
In recent years, deep convolutional neural networks (DCNN) have been widely used in the field of video action recognition. Attention mechanisms are also increasingly utilized in action recognition tasks. In this paper, we want to combine temporal and spatial attention for better video action recognition. Specifically, we learn a set of sparse attention by computing class response maps for finding the most informative region in a video frame. Each video frame is resampled with this information to form two new frames, one focusing on the most discriminative regions of the image and the other on the complementary regions of the image. After computing sparse attention all the newly generated video frames are rearranged in the order of the original video to form two new videos. These two videos are then fed into a CNN as new inputs to reinforce the learning of discriminative regions in the images (spatial attention). And the CNN we used is a network with a frame selection strategy that allows the network to focus on only some of the frames to complete the classification task (temporal attention). Finally, we combine the three video (original, discriminative, and complementary) classification results to get the final result together. Our experiments on the datasets UCF101 and HMDB51 show that our approach outperforms the best available methods.
Style APA, Harvard, Vancouver, ISO itp.
20

Sinulingga, Hagai R., i Seong G. Kong. "Key-Frame Extraction for Reducing Human Effort in Object Detection Training for Video Surveillance". Electronics 12, nr 13 (5.07.2023): 2956. http://dx.doi.org/10.3390/electronics12132956.

Pełny tekst źródła
Streszczenie:
This paper presents a supervised learning scheme that employs key-frame extraction to enhance the performance of pre-trained deep learning models for object detection in surveillance videos. Developing supervised deep learning models requires a significant amount of annotated video frames as training data, which demands substantial human effort for preparation. Key frames, which encompass frames containing false negative or false positive objects, can introduce diversity into the training data and contribute to model improvements. Our proposed approach focuses on detecting false negatives by leveraging the motion information within video frames that contain the detected object region. Key-frame extraction significantly reduces the human effort involved in video frame extraction. We employ interactive labeling to annotate false negative video frames with accurate bounding boxes and labels. These annotated frames are then integrated with the existing training data to create a comprehensive training dataset for subsequent training cycles. Repeating the training cycles gradually improves the object detection performance of deep learning models to monitor a new environment. Experiment results demonstrate that the proposed learning approach improves the performance of the object detection model in a new operating environment, increasing the mean average precision (mAP@0.5) from 54% to 98%. Manual annotation of key frames is reduced by 81% through the proposed key-frame extraction method.
Style APA, Harvard, Vancouver, ISO itp.
21

Guo, Quanmin, Hanlei Wang i Jianhua Yang. "Night Vision Anti-Halation Method Based on Infrared and Visible Video Fusion". Sensors 22, nr 19 (2.10.2022): 7494. http://dx.doi.org/10.3390/s22197494.

Pełny tekst źródła
Streszczenie:
In order to address the discontinuity caused by the direct application of the infrared and visible image fusion anti-halation method to a video, an efficient night vision anti-halation method based on video fusion is proposed. The designed frame selection based on inter-frame difference determines the optimal cosine angle threshold by analyzing the relation of cosine angle threshold with nonlinear correlation information entropy and de-frame rate. The proposed time-mark-based adaptive motion compensation constructs the same number of interpolation frames as the redundant frames by taking the retained frame number as a time stamp. At the same time, considering the motion vector of two adjacent retained frames as the benchmark, the adaptive weights are constructed according to the interframe differences between the interpolated frame and the last retained frame, then the motion vector of the interpolated frame is estimated. The experimental results show that the proposed frame selection strategy ensures the maximum safe frame removal under the premise of continuous video content at different vehicle speeds in various halation scenes. The frame numbers and playing duration of the fused video are consistent with that of the original video, and the content of the interpolated frame is highly synchronized with that of the corresponding original frames. The average FPS of video fusion in this work is about six times that in the frame-by-frame fusion, which effectively improves the anti-halation processing efficiency of video fusion.
Style APA, Harvard, Vancouver, ISO itp.
22

Guo, Xiaoping. "Intelligent Sports Video Classification Based on Deep Neural Network (DNN) Algorithm and Transfer Learning". Computational Intelligence and Neuroscience 2021 (24.11.2021): 1–9. http://dx.doi.org/10.1155/2021/1825273.

Pełny tekst źródła
Streszczenie:
Traditional text annotation-based video retrieval is done by manually labeling videos with text, which is inefficient and highly subjective and generally cannot accurately describe the meaning of videos. Traditional content-based video retrieval uses convolutional neural networks to extract the underlying feature information of images to build indexes and achieves similarity retrieval of video feature vectors according to certain similarity measure algorithms. In this paper, by studying the characteristics of sports videos, we propose the histogram difference method based on using transfer learning and the four-step method based on block matching for mutation detection and fading detection of video shots, respectively. By adaptive thresholding, regions with large frame difference changes are marked as candidate regions for shots, and then the shot boundaries are determined by mutation detection algorithm. Combined with the characteristics of sports video, this paper proposes a key frame extraction method based on clustering and optical flow analysis, and experimental comparison with the traditional clustering method. In addition, this paper proposes a key frame extraction algorithm based on clustering and optical flow analysis for key frame extraction of sports video. The algorithm effectively removes the redundant frames, and the extracted key frames are more representative. Through extensive experiments, the keyword fuzzy finding algorithm based on improved deep neural network and ontology semantic expansion proposed in this paper shows a more desirable retrieval performance, and it is feasible to use this method for video underlying feature extraction, annotation, and keyword finding, and one of the outstanding features of the algorithm is that it can quickly and effectively retrieve the desired video in a large number of Internet video resources, reducing the false detection rate and leakage rate while improving the fidelity, which basically meets people’s daily needs.
Style APA, Harvard, Vancouver, ISO itp.
23

Li, Dengshan, Rujing Wang, Peng Chen, Chengjun Xie, Qiong Zhou i Xiufang Jia. "Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review". Micromachines 13, nr 1 (31.12.2021): 72. http://dx.doi.org/10.3390/mi13010072.

Pełny tekst źródła
Streszczenie:
Video object and human action detection are applied in many fields, such as video surveillance, face recognition, etc. Video object detection includes object classification and object location within the frame. Human action recognition is the detection of human actions. Usually, video detection is more challenging than image detection, since video frames are often more blurry than images. Moreover, video detection often has other difficulties, such as video defocus, motion blur, part occlusion, etc. Nowadays, the video detection technology is able to implement real-time detection, or high-accurate detection of blurry video frames. In this paper, various video object and human action detection approaches are reviewed and discussed, many of them have performed state-of-the-art results. We mainly review and discuss the classic video detection methods with supervised learning. In addition, the frequently-used video object detection and human action recognition datasets are reviewed. Finally, a summarization of the video detection is represented, e.g., the video object and human action detection methods could be classified into frame-by-frame (frame-based) detection, extracting-key-frame detection and using-temporal-information detection; the methods of utilizing temporal information of adjacent video frames are mainly the optical flow method, Long Short-Term Memory and convolution among adjacent frames.
Style APA, Harvard, Vancouver, ISO itp.
24

Liu, Yu-Lun, Yi-Tung Liao, Yen-Yu Lin i Yung-Yu Chuang. "Deep Video Frame Interpolation Using Cyclic Frame Generation". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 8794–802. http://dx.doi.org/10.1609/aaai.v33i01.33018794.

Pełny tekst źródła
Streszczenie:
Video frame interpolation algorithms predict intermediate frames to produce videos with higher frame rates and smooth view transitions given two consecutive frames as inputs. We propose that: synthesized frames are more reliable if they can be used to reconstruct the input frames with high quality. Based on this idea, we introduce a new loss term, the cycle consistency loss. The cycle consistency loss can better utilize the training data to not only enhance the interpolation results, but also maintain the performance better with less training data. It can be integrated into any frame interpolation network and trained in an end-to-end manner. In addition to the cycle consistency loss, we propose two extensions: motion linearity loss and edge-guided training. The motion linearity loss approximates the motion between two input frames to be linear and regularizes the training. By applying edge-guided training, we further improve results by integrating edge information into training. Both qualitative and quantitative experiments demonstrate that our model outperforms the state-of-the-art methods. The source codes of the proposed method and more experimental results will be available at https://github.com/alex04072000/CyclicGen.
Style APA, Harvard, Vancouver, ISO itp.
25

Mielke, Maja, Peter Aerts, Chris Van Ginneken, Sam Van Wassenbergh i Falk Mielke. "Progressive tracking: a novel procedure to facilitate manual digitization of videos". Biology Open 9, nr 11 (4.11.2020): bio055962. http://dx.doi.org/10.1242/bio.055962.

Pełny tekst źródła
Streszczenie:
ABSTRACTDigitization of video recordings often requires the laborious procedure of manually clicking points of interest on individual video frames. Here, we present progressive tracking, a procedure that facilitates manual digitization of markerless videos. In contrast to existing software, it allows the user to follow points of interest with a cursor in the progressing video, without the need to click. To compare the performance of progressive tracking with the conventional frame-wise tracking, we quantified speed and accuracy of both methods, testing two different input devices (mouse and stylus pen). We show that progressive tracking can be twice as fast as frame-wise tracking while maintaining accuracy, given that playback speed is controlled. Using a stylus pen can increase frame-wise tracking speed. The complementary application of the progressive and frame-wise mode is exemplified on a realistic video recording. This study reveals that progressive tracking can vastly facilitate video analysis in experimental research.
Style APA, Harvard, Vancouver, ISO itp.
26

Gill, Harsimranjit Singh, Tarandip Singh, Baldeep Kaur, Gurjot Singh Gaba, Mehedi Masud i Mohammed Baz. "A Metaheuristic Approach to Secure Multimedia Big Data for IoT-Based Smart City Applications". Wireless Communications and Mobile Computing 2021 (4.10.2021): 1–10. http://dx.doi.org/10.1155/2021/7147940.

Pełny tekst źródła
Streszczenie:
Media streaming falls into the category of Big Data. Regardless of the video duration, an enormous amount of information is encoded in accordance with standardized algorithms of videos. In the transmission of videos, the intended recipient is allowed to receive a copy of the broadcasted video; however, the adversary also has access to it which poses a serious concern to the data confidentiality and availability. In this paper, a cryptographic algorithm, Advanced Encryption Standard, is used to conceal the information from malicious intruders. However, in order to utilize fewer system resources, video information is compressed before its encryption. Various compression algorithms such as Discrete Cosine Transform, Integer Wavelet transforms, and Huffman coding are employed to reduce the enormous size of videos. moving picture expert group is a standard employed in video broadcasting, and it constitutes of different frame types, viz., I, B, and P frames. Later, two frame types carry similar information as of foremost type. Even I frame is to be processed and compressed with the abovementioned schemes to discard any redundant information from it. However, I frame embraces an abundance of new information; thus, encryption of this frame is sufficient enough to safeguard the whole video. The introduction of various compression algorithms can further increase the encryption time of one frame. The performance parameters such as PSNR and compression ratio are examined to further analyze the proposed model’s effectiveness. Therefore, the presented approach has superiority over the other schemes when the speed of encryption and processing of data are taken into consideration. After the reversal of the complete system, we have observed no major impact on the quality of the deciphered video. Simulation results ensure that the presented architecture is an efficient method for enciphering the video information.
Style APA, Harvard, Vancouver, ISO itp.
27

Yang, Yixin, Zhiqang Xiang i Jianbo Li. "Research on Low Frame Rate Video Compression Algorithm in the Context of New Media". Security and Communication Networks 2021 (27.09.2021): 1–10. http://dx.doi.org/10.1155/2021/7494750.

Pełny tekst źródła
Streszczenie:
When using the current method to compress the low frame rate video animation video, there is no frame rate compensation for the video image, which cannot eliminate the artifacts generated in the compression process, resulting in low definition, poor quality, and low compression efficiency of the compressed low frame rate video animation video. In the context of new media, the linear function model is introduced to study the frame rate video animation video compression algorithm. In this paper, an adaptive detachable convolutional network is used to estimate the offset of low frame rate video animation using local convolution. According to the estimation results, the video frames are compensated to eliminate the artifacts of low frame rate video animation. After the frame rate compensation, the low frame rate video animation video is divided into blocks, the CS value of the image block is measured, the linear estimation of the image block is carried out by using the linear function model, and the compression of the low frame rate video animation video is completed according to the best linear estimation result. The experimental results show that the low frame rate video and animation video compressed by the proposed algorithm have high definition, high compression quality under different compression ratios, and high compression efficiency under different compression ratios.
Style APA, Harvard, Vancouver, ISO itp.
28

Bhuvaneshwari, T., N. Ramadevi i E. Kalpana. "Face Quality Detection in a Video Frame". International Journal for Research in Applied Science and Engineering Technology 11, nr 8 (31.08.2023): 2206–11. http://dx.doi.org/10.22214/ijraset.2023.55559.

Pełny tekst źródła
Streszczenie:
Abstract: Face detection technology is often used for surveillance of detecting and tracking of people in real time. The applications using these algorithms deal with low quality video feeds having less Pixels Per Inch (ppi) and/or low frame rate. The algorithms perform well with such video feeds, but their performance deteriorates towards high quality, high data-per-frame videos. This project focuses on developing such an algorithm that gives faster results on high quality videos, at par with the algorithms working on low quality videos. The proposed algorithm uses MTCNN as base algorithm, and speeds it up for highdefinition videos. This project also presents a novel solution to the problem of occlusion and detecting faces in videos. This survey provides an overview of the face detection from video literature, which predominantly focuses on visible wavelength face video as input. For the high-quality videos, we will Face-MTCNN and KLT, for low quality videos we will use MTCNN and KLT. Open issues and challenges are pointed out, i.e., highlighting the importance of comparability for algorithm evaluations and the challenge for future work to create Deep Learning (DL) approaches that are interpretable in addition to Track the faces. The suggested methodology is contrasted with conventional facial feature extraction for every frame and with well-known clustering techniques for a collection of videos
Style APA, Harvard, Vancouver, ISO itp.
29

Alfian, Alfiansyah Imanda Putra, Rusydi Umar i Abdul Fadlil. "Penerapan Metode Localization Tampering dan Hashing untuk Deteksi Rekayasa Video Digital". Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 5, nr 2 (29.04.2021): 400–406. http://dx.doi.org/10.29207/resti.v5i2.3015.

Pełny tekst źródła
Streszczenie:
The development of digital video technology which is increasingly advanced makes digital video engineering crimes prone to occur. The change in digital video has changed information communication, and it is easy to use in digital crime. One way to solve this digital crime case is to use the NIST (National Institute of Standards and Technology) method for video forensics. The initial stage is carried out by collecting data and carrying out the process of extracting the collected results. A local hash and noise algorithm can then be used to analyze the resulting results, which will detect any digital video interference or manipulation at each video frame, and perform hash analysis to detect the authenticity of the video. In digital video engineering, histogram analysis can be performed by calculating the histogram value metric, which is used to compare the histogram values ​​of the original video and video noise and make graphical comparisons. The results of the difference in frame analysis show that the results of the video show that the 2nd to 7th frames experience an attack while the histogram calculation of the original video centroid value and video tampering results in different values ​​in the third frame, namely with a value of 124.318 and the 7th frame of the video experiencing a difference in the value of 105,966 videos. tampering and 107,456 in the original video. Hash analysis on video tampering results in an invalid SHA-1 hash, this can prove that the video has been manipulated.
Style APA, Harvard, Vancouver, ISO itp.
30

Yao, Ping. "Key Frame Extraction Method of Music and Dance Video Based on Multicore Learning Feature Fusion". Scientific Programming 2022 (17.01.2022): 1–8. http://dx.doi.org/10.1155/2022/9735392.

Pełny tekst źródła
Streszczenie:
The purpose of video key frame extraction is to use as few video frames as possible to represent as much video content as possible, reduce redundant video frames, and reduce the amount of computation, so as to facilitate quick browsing, content summarization, indexing, and retrieval of videos. In this paper, a method of dance motion recognition and video key frame extraction based on multifeature fusion is designed to learn the complicated and changeable dancer motion recognition. Firstly, multiple features are fused, and then the similarity is measured. Then, the video sequences are clustered by the clustering algorithm according to the scene. Finally, the key frames are extracted according to the minimum amount of motion. Through the quantitative analysis and research of the simulation results of different models, it can be seen that the model proposed in this paper can show high performance and stability. The breakthrough of video clip retrieval technology is bound to effectively promote the inheritance and development of dance, which is of great theoretical significance and practical value.
Style APA, Harvard, Vancouver, ISO itp.
31

Saqib, Shazia, i Syed Kazmi. "Video Summarization for Sign Languages Using the Median of Entropy of Mean Frames Method". Entropy 20, nr 10 (29.09.2018): 748. http://dx.doi.org/10.3390/e20100748.

Pełny tekst źródła
Streszczenie:
Multimedia information requires large repositories of audio-video data. Retrieval and delivery of video content is a very time-consuming process and is a great challenge for researchers. An efficient approach for faster browsing of large video collections and more efficient content indexing and access is video summarization. Compression of data through extraction of keyframes is a solution to these challenges. A keyframe is a representative frame of the salient features of the video. The output frames must represent the original video in temporal order. The proposed research presents a method of keyframe extraction using the mean of consecutive k frames of video data. A sliding window of size k / 2 is employed to select the frame that matches the median entropy value of the sliding window. This is called the Median of Entropy of Mean Frames (MME) method. MME is mean-based keyframes selection using the median of the entropy of the sliding window. The method was tested for more than 500 videos of sign language gestures and showed satisfactory results.
Style APA, Harvard, Vancouver, ISO itp.
32

Yan, Bo, Chuming Lin i Weimin Tan. "Frame and Feature-Context Video Super-Resolution". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 5597–604. http://dx.doi.org/10.1609/aaai.v33i01.33015597.

Pełny tekst źródła
Streszczenie:
For video super-resolution, current state-of-the-art approaches either process multiple low-resolution (LR) frames to produce each output high-resolution (HR) frame separately in a sliding window fashion or recurrently exploit the previously estimated HR frames to super-resolve the following frame. The main weaknesses of these approaches are: 1) separately generating each output frame may obtain high-quality HR estimates while resulting in unsatisfactory flickering artifacts, and 2) combining previously generated HR frames can produce temporally consistent results in the case of short information flow, but it will cause significant jitter and jagged artifacts because the previous super-resolving errors are constantly accumulated to the subsequent frames.In this paper, we propose a fully end-to-end trainable frame and feature-context video super-resolution (FFCVSR) network that consists of two key sub-networks: local network and context network, where the first one explicitly utilizes a sequence of consecutive LR frames to generate local feature and local SR frame, and the other combines the outputs of local network and the previously estimated HR frames and features to super-resolve the subsequent frame. Our approach takes full advantage of the inter-frame information from multiple LR frames and the context information from previously predicted HR frames, producing temporally consistent highquality results while maintaining real-time speed by directly reusing previous features and frames. Extensive evaluations and comparisons demonstrate that our approach produces state-of-the-art results on a standard benchmark dataset, with advantages in terms of accuracy, efficiency, and visual quality over the existing approaches.
Style APA, Harvard, Vancouver, ISO itp.
33

Wu, Wei Qiang, Lei Wang, Qin Yu Zhang i Chang Jian Zhang. "The RTP Encapsulation Based on Frame Type Method for AVS Video". Applied Mechanics and Materials 263-266 (grudzień 2012): 1803–8. http://dx.doi.org/10.4028/www.scientific.net/amm.263-266.1803.

Pełny tekst źródła
Streszczenie:
According to the characteristics of AVS video data, a RTP encapsulation method based on frame type is proposed. When the video data is encapsulated by the RTP protocol, different types of video data such as sequence header, sequence end, I frame, P frame and B frame are encapsulated with different method. Under the limit of maximum transmission unit (MTU), sequence headers, sequence ends and I frames are encapsulated individually to reduce the packet length and protect the important data. While multiple P frames and B frames are encapsulated into one RTP packet to reduce the quantity of the RTP packets and decrease the overload of link. Simulation results show that, compared to the frame-based encapsulation method, the proposed method can reduce the packet loss rate of the video data effectively and improve the quality of video service.
Style APA, Harvard, Vancouver, ISO itp.
34

Li, Dengshan, Rujing Wang, Chengjun Xie, Liu Liu, Jie Zhang, Rui Li, Fangyuan Wang, Man Zhou i Wancai Liu. "A Recognition Method for Rice Plant Diseases and Pests Video Detection Based on Deep Convolutional Neural Network". Sensors 20, nr 3 (21.01.2020): 578. http://dx.doi.org/10.3390/s20030578.

Pełny tekst źródła
Streszczenie:
Increasing grain production is essential to those areas where food is scarce. Increasing grain production by controlling crop diseases and pests in time should be effective. To construct video detection system for plant diseases and pests, and to build a real-time crop diseases and pests video detection system in the future, a deep learning-based video detection architecture with a custom backbone was proposed for detecting plant diseases and pests in videos. We first transformed the video into still frame, then sent the frame to the still-image detector for detection, and finally synthesized the frames into video. In the still-image detector, we used faster-RCNN as the framework. We used image-training models to detect relatively blurry videos. Additionally, a set of video-based evaluation metrics based on a machine learning classifier was proposed, which reflected the quality of video detection effectively in the experiments. Experiments showed that our system with the custom backbone was more suitable for detection of the untrained rice videos than VGG16, ResNet-50, ResNet-101 backbone system and YOLOv3 with our experimental environment.
Style APA, Harvard, Vancouver, ISO itp.
35

Lee, Ki-Sun, Eunyoung Lee, Bareun Choi i Sung-Bom Pyun. "Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks". Diagnostics 11, nr 2 (13.02.2021): 300. http://dx.doi.org/10.3390/diagnostics11020300.

Pełny tekst źródła
Streszczenie:
Background: Video fluoroscopic swallowing study (VFSS) is considered as the gold standard diagnostic tool for evaluating dysphagia. However, it is time consuming and labor intensive for the clinician to manually search the recorded long video image frame by frame to identify the instantaneous swallowing abnormality in VFSS images. Therefore, this study aims to present a deep leaning-based approach using transfer learning with a convolutional neural network (CNN) that automatically annotates pharyngeal phase frames in untrimmed VFSS videos such that frames need not be searched manually. Methods: To determine whether the image frame in the VFSS video is in the pharyngeal phase, a single-frame baseline architecture based the deep CNN framework is used and a transfer learning technique with fine-tuning is applied. Results: Compared with all experimental CNN models, that fine-tuned with two blocks of the VGG-16 (VGG16-FT5) model achieved the highest performance in terms of recognizing the frame of pharyngeal phase, that is, the accuracy of 93.20 (±1.25)%, sensitivity of 84.57 (±5.19)%, specificity of 94.36 (±1.21)%, AUC of 0.8947 (±0.0269) and Kappa of 0.7093 (±0.0488). Conclusions: Using appropriate and fine-tuning techniques and explainable deep learning techniques such as grad CAM, this study shows that the proposed single-frame-baseline-architecture-based deep CNN framework can yield high performances in the full automation of VFSS video analysis.
Style APA, Harvard, Vancouver, ISO itp.
36

Le, Trung-Nghia, Tam V. Nguyen, Quoc-Cuong Tran, Lam Nguyen, Trung-Hieu Hoang, Minh-Quan Le i Minh-Triet Tran. "Interactive Video Object Mask Annotation". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 18 (18.05.2021): 16067–70. http://dx.doi.org/10.1609/aaai.v35i18.18014.

Pełny tekst źródła
Streszczenie:
In this paper, we introduce a practical system for interactive video object mask annotation, which can support multiple back-end methods. To demonstrate the generalization of our system, we introduce a novel approach for video object annotation. Our proposed system takes scribbles at a chosen key-frame from the end-users via a user-friendly interface and produces masks of corresponding objects at the key-frame via the Control-Point-based Scribbles-to-Mask (CPSM) module. The object masks at the key-frame are then propagated to other frames and refined through the Multi-Referenced Guided Segmentation (MRGS) module. Last but not least, the user can correct wrong segmentation at some frames, and the corrected mask is continuously propagated to other frames in the video via the MRGS to produce the object masks at all video frames.
Style APA, Harvard, Vancouver, ISO itp.
37

Chen, Yongjie, i Tieru Wu. "SATVSR: Scenario Adaptive Transformer for Cross Scenarios Video Super-Resolution". Journal of Physics: Conference Series 2456, nr 1 (1.03.2023): 012028. http://dx.doi.org/10.1088/1742-6596/2456/1/012028.

Pełny tekst źródła
Streszczenie:
Abstract Video Super-Resolution (VSR) aims to recover sequences of high-resolution (HR) frames from low-resolution (LR) frames. Previous methods mainly utilize temporally adjacent frames to assist the reconstruction of target frames. However, in the real world, there is a lot of irrelevant information in adjacent frames of videos with fast scene switching, these VSR methods cannot adaptively distinguish and select useful information. In contrast, with a transformer structure suitable for temporal tasks, we devise a novel adaptive scenario video super-resolution method. Specifically, we use optical flow to label the patches in each video frame, only calculate the attention of patches with the same label. Then select the most relevant label among them to supplement the spatial-temporal information of the target frame. This design can directly make the supplementary information come from the same scene as much as possible. We further propose a cross-scale feature aggregation module to better handle the scale variation problem. Compared with other video super-resolution methods, our method not only achieves significant performance gains on single-scene videos but also has better robustness on cross-scene datasets.
Style APA, Harvard, Vancouver, ISO itp.
38

HOSUR, PRABHUDEV, i ROLANDO CARRASCO. "ENHANCED FRAME-BASED VIDEO CODING TO SUPPORT CONTENT-BASED FUNCTIONALITIES". International Journal of Computational Intelligence and Applications 06, nr 02 (czerwiec 2006): 161–75. http://dx.doi.org/10.1142/s1469026806001939.

Pełny tekst źródła
Streszczenie:
This paper presents the enhanced frame-based video coding scheme. The input source video to the enhanced frame-based video encoder consists of a rectangular-sized video and shapes of arbitrarily shaped objects on video frames. The rectangular frame texture is encoded by the conventional frame-based coding technique and the video object's shape is encoded using the contour-based vertex coding. It is possible to achieve several useful content-based functionalities by utilizing the shape information in the bitstream at the cost of a very small overhead to the bit-rate.
Style APA, Harvard, Vancouver, ISO itp.
39

Qu, Zhong, i Teng Fei Gao. "An Improved Algorithm of Keyframe Extraction for Video Summarization". Advanced Materials Research 225-226 (kwiecień 2011): 807–11. http://dx.doi.org/10.4028/www.scientific.net/amr.225-226.807.

Pełny tekst źródła
Streszczenie:
Video segmentation and keyframe extraction are the basis of Content-based Video Retrieval (CBVR), in which keyframe selection plays the central role in CBVR. In this paper, as the initialization of keyframe extraction, we proposed an improved approach of key-frame extraction for video summarization. In our approach, videos were firstly segmented into shots according to video content, by our improved histogram-based method, with the use of histogram intersection and nonuniform partitioning and weighting. Then, within each shot, keyframes were determined with the calculation of image entropy as a reflection of the quantity of image information in HSV color space of every frame. Our simulation results in section 4 prove that extracted key frames with our method are compact and faithful to the original video.
Style APA, Harvard, Vancouver, ISO itp.
40

Li, Qian, Rangding Wang i Dawen Xu. "An Inter-Frame Forgery Detection Algorithm for Surveillance Video". Information 9, nr 12 (28.11.2018): 301. http://dx.doi.org/10.3390/info9120301.

Pełny tekst źródła
Streszczenie:
Surveillance systems are ubiquitous in our lives, and surveillance videos are often used as significant evidence for judicial forensics. However, the authenticity of surveillance videos is difficult to guarantee. Ascertaining the authenticity of surveillance video is an urgent problem. Inter-frame forgery is one of the most common ways for video tampering. The forgery will reduce the correlation between adjacent frames at tampering position. Therefore, the correlation can be used to detect tamper operation. The algorithm is composed of feature extraction and abnormal point localization. During feature extraction, we extract the 2-D phase congruency of each frame, since it is a good image characteristic. Then calculate the correlation between the adjacent frames. In the second phase, the abnormal points were detected by using k-means clustering algorithm. The normal and abnormal points were clustered into two categories. Experimental results demonstrate that the scheme has high detection and localization accuracy.
Style APA, Harvard, Vancouver, ISO itp.
41

Lv, Changhai, Junfeng Li i Jian Tian. "Key Frame Extraction for Sports Training Based on Improved Deep Learning". Scientific Programming 2021 (1.09.2021): 1–8. http://dx.doi.org/10.1155/2021/1016574.

Pełny tekst źródła
Streszczenie:
With the rapid technological advances in sports, the number of athletics increases gradually. For sports professionals, it is obligatory to oversee and explore the athletics pose in athletes’ training. Key frame extraction of training videos plays a significant role to ease the analysis of sport training videos. This paper develops a sports actions’ classification system for accurately classifying athlete’s actions. The key video frames are extracted from the sports training video to highlight the distinct actions in sports training. Subsequently, a fully convolutional network (FCN) is used to extract the region of interest (ROI) pose detection of frames followed by the application of a convolution neural network (CNN) to estimate the pose probability of each frame. Moreover, a distinct key frame extraction approach is established to extract the key frames considering neighboring frames’ probability differences. The experimental results determine that the proposed method showed better performance and can recognize the athlete’s posture with an average classification rate of 98%. The experimental results and analysis validate that the proposed key frame extraction method outperforms its counterparts in key pose probability estimation and key pose extraction.
Style APA, Harvard, Vancouver, ISO itp.
42

Sun, Yunyun, Peng Li, Zhaohui Jiang i Sujun Hu. "Feature fusion and clustering for key frame extraction". Mathematical Biosciences and Engineering 18, nr 6 (2021): 9294–311. http://dx.doi.org/10.3934/mbe.2021457.

Pełny tekst źródła
Streszczenie:
<abstract> <p>Numerous limitations of Shot-based and Content-based key-frame extraction approaches have encouraged the development of Cluster-based algorithms. This paper proposes an Optimal Threshold and Maximum Weight (OTMW) clustering approach that allows accurate and automatic extraction of video summarization. Firstly, the video content is analyzed using the image color, texture and information complexity, and video feature dataset is constructed. Then a Golden Section method is proposed to determine the threshold function optimal solution. The initial cluster center and the cluster number <italic>k</italic> are automatically obtained by employing the improved clustering algorithm. k-clusters video frames are produced with the help of K-MEANS algorithm. The representative frame of each cluster is extracted using the Maximum Weight method and an accurate video summarization is obtained. The proposed approach is tested on 16 multi-type videos, and the obtained key-frame quality evaluation index, and the average of Fidelity and Ratio are 96.11925 and 97.128, respectively. Fortunately, the key-frames extracted by the proposed approach are consistent with artificial visual judgement. The performance of the proposed approach is compared with several state-of-the-art cluster-based algorithms, and the Fidelity are increased by 12.49721, 10.86455, 10.62984 and 10.4984375, respectively. In addition, the Ratio is increased by 1.958 on average with small fluctuations. The obtained experimental results demonstrate the advantage of the proposed solution over several related baselines on sixteen diverse datasets and validated that proposed approach can accurately extract video summarization from multi-type videos.</p> </abstract>
Style APA, Harvard, Vancouver, ISO itp.
43

Desai, Padmashree, C. Sujatha, Saumyajit Chakraborty, Saurav Ansuman, Sanika Bhandari i Sharan Kardiguddi. "Next frame prediction using ConvLSTM". Journal of Physics: Conference Series 2161, nr 1 (1.01.2022): 012024. http://dx.doi.org/10.1088/1742-6596/2161/1/012024.

Pełny tekst źródła
Streszczenie:
Abstract Intelligent decision-making systems require the potential for forecasting, foreseeing, and reasoning about future events. The issue of video frame prediction has aroused a lot of attention due to its usefulness in many computer vision applications such as autonomous vehicles and robots. Recent deep learning advances have significantly improved video prediction performance. Nevertheless, as top-performing systems attempt to foresee even more future frames, their predictions become increasingly foggy. We developed a method for predicting a future frame based on a series of prior frames that services the Convolutional Long-Short Term Memory (ConvLSTM) model. The input video is segmented into frames, fed to the ConvLSTM model to extract the features and forecast a future frame which can be beneficial in a variety of applications. We have used two metrics to measure the quality of the predicted frame: structural similarity index (SSIM) and perceptual distance, which help in understanding the difference between the actual frame and the predicted frame. The UCF101 data set is used for testing and training in the project. It is a data collection of realistic action videos taken from YouTube with 101 action categories for action detection. The ConvLSTM model is trained and tested for 24 categories from this dataset and a future frame is predicted which yields satisfactory results. We obtained SSIM as 0.95 and perceptual similarity as 24.28 for our system. The suggested work’s results are also compared to those of state-of-the-art approaches, which are shown to be superior.
Style APA, Harvard, Vancouver, ISO itp.
44

Zhang, Xiao-Yu, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu i Lixin Duan. "Learning Transferable Self-Attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 9227–34. http://dx.doi.org/10.1609/aaai.v33i01.33019227.

Pełny tekst źródła
Streszczenie:
Action recognition in videos has attracted a lot of attention in the past decade. In order to learn robust models, previous methods usually assume videos are trimmed as short sequences and require ground-truth annotations of each video frame/sequence, which is quite costly and time-consuming. In this paper, given only video-level annotations, we propose a novel weakly supervised framework to simultaneously locate action frames as well as recognize actions in untrimmed videos. Our proposed framework consists of two major components. First, for action frame localization, we take advantage of the self-attention mechanism to weight each frame, such that the influence of background frames can be effectively eliminated. Second, considering that there are trimmed videos publicly available and also they contain useful information to leverage, we present an additional module to transfer the knowledge from trimmed videos for improving the classification performance in untrimmed ones. Extensive experiments are conducted on two benchmark datasets (i.e., THUMOS14 and ActivityNet1.3), and experimental results clearly corroborate the efficacy of our method.
Style APA, Harvard, Vancouver, ISO itp.
45

Mashtalir, Sergii, i Olena Mikhnova. "Key Frame Extraction from Video". International Journal of Computer Vision and Image Processing 4, nr 2 (lipiec 2014): 68–79. http://dx.doi.org/10.4018/ijcvip.2014040105.

Pełny tekst źródła
Streszczenie:
A complete overview of key frame extraction techniques has been provided. It has been found out that such techniques usually have three phases, namely shot boundary detection as a pre-processing phase, main phase of key frame detection, where visual, structural, audio and textual features are extracted from each frame, then processed and analyzed with artificial intelligence methods, and the last post-processing phase lies in removal of duplicates if they occur in the resulting sequence of key frames. Estimation techniques and available test video collections have been also observed. At the end, conclusions concerning drawbacks of the examined procedure and basic tendencies of its development have been marked.
Style APA, Harvard, Vancouver, ISO itp.
46

Suin, Maitreya, i A. N. Rajagopalan. "An Efficient Framework for Dense Video Captioning". Proceedings of the AAAI Conference on Artificial Intelligence 34, nr 07 (3.04.2020): 12039–46. http://dx.doi.org/10.1609/aaai.v34i07.6881.

Pełny tekst źródła
Streszczenie:
Dense video captioning is an extremely challenging task since an accurate and faithful description of events in a video requires a holistic knowledge of the video contents as well as contextual reasoning of individual events. Most existing approaches handle this problem by first proposing event boundaries from a video and then captioning on a subset of the proposals. Generation of dense temporal annotations and corresponding captions from long videos can be dramatically source consuming. In this paper, we focus on the task of generating a dense description of temporally untrimmed videos and aim to significantly reduce the computational cost by processing fewer frames while maintaining accuracy. Existing video captioning methods sample frames with a predefined frequency over the entire video or use all the frames. Instead, we propose a deep reinforcement-based approach which enables an agent to describe multiple events in a video by watching a portion of the frames. The agent needs to watch more frames when it is processing an informative part of the video, and skip frames when there is redundancy. The agent is trained using actor-critic algorithm, where the actor determines the frames to be watched from a video and the critic assesses the optimality of the decisions taken by the actor. Such an efficient frame selection simplifies the event proposal task considerably. This has the added effect of reducing the occurrence of unwanted proposals. The encoded state representation of the frame selection agent is further utilized for guiding event proposal and caption generation tasks. We also leverage the idea of knowledge distillation to improve the accuracy. We conduct extensive evaluations on ActivityNet captions dataset to validate our method.
Style APA, Harvard, Vancouver, ISO itp.
47

Liang, Buyun, Na Li, Zheng He, Zhongyuan Wang, Youming Fu i Tao Lu. "News Video Summarization Combining SURF and Color Histogram Features". Entropy 23, nr 8 (30.07.2021): 982. http://dx.doi.org/10.3390/e23080982.

Pełny tekst źródła
Streszczenie:
Because the data volume of news videos is increasing exponentially, a way to quickly browse a sketch of the video is important in various applications, such as news media, archives and publicity. This paper proposes a news video summarization method based on SURF features and an improved clustering algorithm, to overcome the defects in existing algorithms that fail to account for changes in shot complexity. Firstly, we extracted SURF features from the video sequences and matched the features between adjacent frames, and then detected the abrupt and gradual boundaries of the shot by calculating similarity scores between adjacent frames with the help of double thresholds. Secondly, we used an improved clustering algorithm to cluster the color histogram of the video frames within the shot, which merged the smaller clusters and then selected the frame closest to the cluster center as the key frame. The experimental results on both the public and self-built datasets show the superiority of our method over the alternatives in terms of accuracy and speed. Additionally, the extracted key frames demonstrate low redundancy and can credibly represent a sketch of news videos.
Style APA, Harvard, Vancouver, ISO itp.
48

Ren, Honge, Walid Atwa, Haosu Zhang, Shafiq Muhammad i Mahmoud Emam. "Frame Duplication Forgery Detection and Localization Algorithm Based on the Improved Levenshtein Distance". Scientific Programming 2021 (31.03.2021): 1–10. http://dx.doi.org/10.1155/2021/5595850.

Pełny tekst źródła
Streszczenie:
In this digital era of technology and software development tools, low-cost digital cameras and powerful video editing software (such as Adobe Premiere, Microsoft Movie Maker, and Magix Vegas) have become available for any common user. Through these softwares, editing the contents of digital videos became very easy. Frame duplication is a common video forgery attack which can be done by copying and pasting a sequence of frames within the same video in order to hide or replicate some events from the video. Many algorithms have been proposed in the literature to detect such forgeries from the video sequences through analyzing the spatial and temporal correlations. However, most of them are suffering from low efficiency and accuracy rates and high computational complexity. In this paper, we are proposing an efficient and robust frame duplication detection algorithm to detect duplicated frames from the video sequence based on the improved Levenshtein distance. Extensive experiments were performed on some selected video sequences captured by stationary and moving cameras. In the experimental results, the proposed algorithm showed efficacy compared with the state-of-the-art techniques.
Style APA, Harvard, Vancouver, ISO itp.
49

Kumar, Vikas, Tanupriya Choudhury, Suresh Chandra Satapathy, Ravi Tomar i Archit Aggarwal. "Video super resolution using convolutional neural network and image fusion techniques". International Journal of Knowledge-based and Intelligent Engineering Systems 24, nr 4 (18.01.2021): 279–87. http://dx.doi.org/10.3233/kes-190037.

Pełny tekst źródła
Streszczenie:
Recently, huge progress has been achieved in the field of single image super resolution which augments the resolution of images. The idea behind super resolution is to convert low-resolution images into high-resolution images. SRCNN (Single Resolution Convolutional Neural Network) was a huge improvement over the existing methods of single-image super resolution. However, video super-resolution, despite being an active field of research, is yet to benefit from deep learning. Using still images and videos downloaded from various sources, we explore the possibility of using SRCNN along with image fusion techniques (minima, maxima, average, PCA, DWT) to improve over existing video super resolution methods. Video Super-Resolution has inherent difficulties such as unexpected motion, blur and noise. We propose Video Super Resolution – Image Fusion (VSR-IF) architecture which utilizes information from multiple frames to produce a single high- resolution frame for a video. We use SRCNN as a reference model to obtain high resolution adjacent frames and use a concatenation layer to group those frames into a single frame. Since, our method is data-driven and requires only minimal initial training, it is faster than other video super resolution methods. After testing our program, we find that our technique shows a significant improvement over SCRNN and other single image and frame super resolution techniques.
Style APA, Harvard, Vancouver, ISO itp.
50

Gowda, Shreyank N., Marcus Rohrbach i Laura Sevilla-Lara. "SMART Frame Selection for Action Recognition". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 2 (18.05.2021): 1451–59. http://dx.doi.org/10.1609/aaai.v35i2.16235.

Pełny tekst źródła
Streszczenie:
Video classification is computationally expensive. In this paper, we address theproblem of frame selection to reduce the computational cost of video classification.Recent work has successfully leveraged frame selection for long, untrimmed videos,where much of the content is not relevant, and easy to discard. In this work, however,we focus on the more standard short, trimmed video classification problem. Weargue that good frame selection can not only reduce the computational cost of videoclassification but also increase the accuracy by getting rid of frames that are hard toclassify. In contrast to previous work, we propose a method that instead of selectingframes by considering one at a time, considers them jointly. This results in a moreefficient selection, where “good" frames are more effectively distributed over thevideo, like snapshots that tell a story. We call the proposed frame selection SMARTand we test it in combination with different backbone architectures and on multiplebenchmarks (Kinetics [5], Something-something [14], UCF101 [31]). We showthat the SMART frame selection consistently improves the accuracy compared toother frame selection strategies while reducing the computational cost by a factorof 4 to 10 times. Additionally, we show that when the primary goal is recognitionperformance, our selection strategy can improve over recent state-of-the-art modelsand frame selection strategies on various benchmarks (UCF101, HMDB51 [21],FCVID [17], and ActivityNet [4]).
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii