Journal articles on the topic 'Features of video information'

To see the other types of publications on this topic, follow the link: Features of video information.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Features of video information.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Chunhu, Yun Tan, Jiaohua Qin, and Xuyu Xiang. "Coverless Video Steganography Based on Audio and Frame Features." Security and Communication Networks 2022 (April 4, 2022): 1–14. http://dx.doi.org/10.1155/2022/1154098.

Full text
Abstract:
The coverless steganography based on video has become a research hot spot recently. However, the existing schemes usually hide secret information based on the single-frame feature of video and do not take advantage of other rich features. In this work, we propose a novel coverless steganography, which makes full use of the audio and frame image features of the video. First, three features are extracted to obtain hash bit sequences, which include DWT (discrete wavelet transform) coefficients and short-term energy of audio and the SIFT (scale-invariant feature transformation) feature of frame images. Then, we build a retrieval database according to the relationship between the generated bit sequences and three features of the corresponding videos. The sender divides the secret information into segments and sends the corresponding retrieval information and carrier videos to the receiver. The receiver can use the retrieval information to recover the secret information from the carrier videos correspondingly. The experimental results show that the proposed method can achieve larger capacity, less time cost, higher hiding success rate, and stronger robustness compared with the existing coverless steganography schemes based on the video.
APA, Harvard, Vancouver, ISO, and other styles
2

Ramezani, Mohsen, and Farzin Yaghmaee. "Retrieving Human Action by Fusing the Motion Information of Interest Points." International Journal on Artificial Intelligence Tools 27, no. 03 (May 2018): 1850008. http://dx.doi.org/10.1142/s0218213018500082.

Full text
Abstract:
In response to the fast propagation of videos on the Internet, Content-Based Video Retrieval (CBVR) was introduced to help users find their desired items. Since most videos concern humans, human action retrieval was introduced as a new topic in CBVR. Most human action retrieval methods represent an action by extracting and describing its local features as more reliable than global ones; however, these methods are complex and not very accurate. In this paper, a low-complexity representation method that more accurately describes extracted local features is proposed. In this method, each video is represented independently from other videos. To this end, the motion information of each extracted feature is described by the directions and sizes of its movements. In this system, the correspondence between the directions and sizes of the movements is used to compare videos. Finally, videos that correspond best with the query video are delivered to the user. Experimental results illustrate that this method can outperform state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
3

Waykar, Sanjay B., and C. R. Bharathi. "Multimodal Features and Probability Extended Nearest Neighbor Classification for Content-Based Lecture Video Retrieval." Journal of Intelligent Systems 26, no. 3 (July 26, 2017): 585–99. http://dx.doi.org/10.1515/jisys-2016-0041.

Full text
Abstract:
AbstractDue to the ever-increasing number of digital lecture libraries and lecture video portals, the challenge of retrieving lecture videos has become a very significant and demanding task in recent years. Accordingly, the literature presents different techniques for video retrieval by considering video contents as well as signal data. Here, we propose a lecture video retrieval system using multimodal features and probability extended nearest neighbor (PENN) classification. There are two modalities utilized for feature extraction. One is textual information, which is determined from the lecture video using optical character recognition. The second modality utilized to preserve video content is local vector pattern. These two modal features are extracted, and the retrieval of videos is performed using the proposed PENN classifier, which is the extension of the extended nearest neighbor classifier, by considering the different weightages for the first-level and second-level neighbors. The performance of the proposed video retrieval is evaluated using precision, recall, and F-measure, which are computed by matching the retrieved videos and the manually classified videos. From the experimentation, we proved that the average precision of the proposed PENN+VQ is 78.3%, which is higher than that of the existing methods.
APA, Harvard, Vancouver, ISO, and other styles
4

Gong, Xiaohui. "A Personalized Recommendation Method for Short Drama Videos Based on External Index Features." Advances in Meteorology 2022 (April 18, 2022): 1–10. http://dx.doi.org/10.1155/2022/3601956.

Full text
Abstract:
Dramatic short videos have quickly gained a huge number of user views in the current short video boom. The information presentation dimension of short videos is higher, and it is easier to be accepted and spread by people. At present, there are a large number of drama short video messages on the Internet. These short video messages have brought serious information overload to users and also brought great challenges to short video operators and video editors. Therefore, how to process short videos quickly has become a research hotspot. The traditional episode recommendation process often adopts collaborative filtering recommendation or content-based recommendation to users, but these methods have certain limitations. Short videos have fast dissemination speed, strong timeliness, and fast hot search speed. These have become the characteristics of short video dissemination. Traditional recommendation methods cannot recommend short videos with high attention and high popularity. To this end, this paper adds external index features to extract short video features and proposes a short video recommendation method based on index features. Using external features to classify and recommend TV series videos, this method can quickly and accurately make recommendations to target customers. Through the experimental analysis, it can be seen that the method in this paper has a good effect.
APA, Harvard, Vancouver, ISO, and other styles
5

Ye, Qing, Haoxin Zhong, Chang Qu, and Yongmei Zhang. "Human Interaction Recognition Based on Whole-Individual Detection." Sensors 20, no. 8 (April 20, 2020): 2346. http://dx.doi.org/10.3390/s20082346.

Full text
Abstract:
Human interaction recognition technology is a hot topic in the field of computer vision, and its application prospects are very extensive. At present, there are many difficulties in human interaction recognition such as the spatial complexity of human interaction, the differences in action characteristics at different time periods, and the complexity of interactive action features. The existence of these problems restricts the improvement of recognition accuracy. To investigate the differences in the action characteristics at different time periods, we propose an improved fusion time-phase feature of the Gaussian model to obtain video keyframes and remove the influence of a large amount of redundant information. Regarding the complexity of interactive action features, we propose a multi-feature fusion network algorithm based on parallel Inception and ResNet. This multi-feature fusion network not only reduces the network parameter quantity, but also improves the network performance; it alleviates the network degradation caused by the increase in network depth and obtains higher classification accuracy. For the spatial complexity of human interaction, we combined the whole video features with the individual video features, making full use of the feature information of the interactive video. A human interaction recognition algorithm based on whole–individual detection is proposed, where the whole video contains the global features of both sides of action, and the individual video contains the individual detail features of a single person. Making full use of the feature information of the whole video and individual videos is the main contribution of this paper to the field of human interaction recognition and the experimental results in the UT dataset (UT–interaction dataset) showed that the accuracy of this method was 91.7%.
APA, Harvard, Vancouver, ISO, and other styles
6

K, Jayasree, and Sumam Mary Idicula. "Enhanced Video Classification System Using a Block-Based Motion Vector." Information 11, no. 11 (October 24, 2020): 499. http://dx.doi.org/10.3390/info11110499.

Full text
Abstract:
The main objective of this work was to design and implement a support vector machine-based classification system to classify video data into predefined classes. Video data has to be structured and indexed for any video classification methodology. Video structure analysis involves shot boundary detection and keyframe extraction. Shot boundary detection is performed using a two-pass block-based adaptive threshold method. The seek spread strategy is used for keyframe extraction. In most of the video classification methods, selection of features is important. The selected features contribute to the efficiency of the classification system. It is very hard to find out which combination of features is most effective. Feature selection makes relevance to the proposed system. Herein, a support vector machine-based classifier was considered for the classification of video clips. The performance of the proposed system considered six categories of video clips: cartoons, commercials, cricket, football, tennis, and news. When shot level features and keyframe features, along with motion vectors, were used, 86% correct classification was achieved, which was comparable with the existing methods. The research concentrated on feature extraction where combination of selected features was given to a classifier to get the best classification performance.
APA, Harvard, Vancouver, ISO, and other styles
7

Huang, Dong Mei, and Kai Feng. "A Near-Duplicate Video Detection with Temporal Consistency Feature." Advanced Materials Research 798-799 (September 2013): 510–14. http://dx.doi.org/10.4028/www.scientific.net/amr.798-799.510.

Full text
Abstract:
There are a wide variety of video data in the information-oriented society, and how to detect the video clips that users want in the massive video data quickly and accurately is attracting more people to research. Given the existing near-duplicate video detection algorithms are processed by extracting global or local features directly in the key frame level, which is very time-consuming, this paper introduces a new cascaded near-duplicate video detection approach using the temporal consistency feature in the shot level to preliminarily filter out some dissimilar videos before extracting features, and then combining global and local features to obtain the ultimate videos that are duplicated with the query video step by step. We have verified the approach by experimenting on the CC_WEB_VIDEO dataset, and compared the performance with the method based on global signature color histogram. The results show the proposed method can achieve better detection accuracy, especially for the videos with complex motion scenes and great frame changes.
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Hanqing, Chunyan Hu, Feifei Lee, Chaowei Lin, Wei Yao, Lu Chen, and Qiu Chen. "A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval." Sensors 21, no. 9 (April 29, 2021): 3094. http://dx.doi.org/10.3390/s21093094.

Full text
Abstract:
Recently, with the popularization of camera tools such as mobile phones and the rise of various short video platforms, a lot of videos are being uploaded to the Internet at all times, for which a video retrieval system with fast retrieval speed and high precision is very necessary. Therefore, content-based video retrieval (CBVR) has aroused the interest of many researchers. A typical CBVR system mainly contains the following two essential parts: video feature extraction and similarity comparison. Feature extraction of video is very challenging, previous video retrieval methods are mostly based on extracting features from single video frames, while resulting the loss of temporal information in the videos. Hashing methods are extensively used in multimedia information retrieval due to its retrieval efficiency, but most of them are currently only applied to image retrieval. In order to solve these problems in video retrieval, we build an end-to-end framework called deep supervised video hashing (DSVH), which employs a 3D convolutional neural network (CNN) to obtain spatial-temporal features of videos, then train a set of hash functions by supervised hashing to transfer the video features into binary space and get the compact binary codes of videos. Finally, we use triplet loss for network training. We conduct a lot of experiments on three public video datasets UCF-101, JHMDB and HMDB-51, and the results show that the proposed method has advantages over many state-of-the-art video retrieval methods. Compared with the DVH method, the mAP value of UCF-101 dataset is improved by 9.3%, and the minimum improvement on JHMDB dataset is also increased by 0.3%. At the same time, we also demonstrate the stability of the algorithm in the HMDB-51 dataset.
APA, Harvard, Vancouver, ISO, and other styles
9

Oliveira, Eva, Teresa Chambel, and Nuno Magalhães Ribeiro. "Sharing Video Emotional Information in the Web." International Journal of Web Portals 5, no. 3 (July 2013): 19–39. http://dx.doi.org/10.4018/ijwp.2013070102.

Full text
Abstract:
Video growth over the Internet changed the way users search, browse and view video content. Watching movies over the Internet is increasing and becoming a pastime. The possibility of streaming Internet content to TV, advances in video compression techniques and video streaming have turned this recent modality of watching movies easy and doable. Web portals as a worldwide mean of multimedia data access need to have their contents properly classified in order to meet users’ needs and expectations. The authors propose a set of semantic descriptors based on both user physiological signals, captured while watching videos, and on video low-level features extraction. These XML based descriptors contribute to the creation of automatic affective meta-information that will not only enhance a web-based video recommendation system based in emotional information, but also enhance search and retrieval of videos affective content from both users’ personal classifications and content classifications in the context of a web portal.
APA, Harvard, Vancouver, ISO, and other styles
10

Ma, Biao, and Minghui Ji. "Motion Feature Retrieval in Basketball Match Video Based on Multisource Motion Feature Fusion." Advances in Mathematical Physics 2022 (January 11, 2022): 1–10. http://dx.doi.org/10.1155/2022/9965764.

Full text
Abstract:
Both the human body and its motion are three-dimensional information, while the traditional feature description method of two-person interaction based on RGB video has a low degree of discrimination due to the lack of depth information. According to the respective advantages and complementary characteristics of RGB video and depth video, a retrieval algorithm based on multisource motion feature fusion is proposed. Firstly, the algorithm uses the combination of spatiotemporal interest points and word bag model to represent the features of RGB video. Then, the directional gradient histogram is used to represent the feature of the depth video frame. The statistical features of key frames are introduced to represent the histogram features of depth video. Finally, the multifeature image fusion algorithm is used to fuse the two video features. The experimental results show that multisource feature fusion can greatly improve the retrieval accuracy of motion features.
APA, Harvard, Vancouver, ISO, and other styles
11

Chen, Junyu, Ganlan Peng, Yuanfang Peng, Mu Fang, Zhibin Chen, Jianqing Li, and Liang Lan. "Key Clips and Key Frames Extraction of Videos Based on Deep Learning." Journal of Physics: Conference Series 2025, no. 1 (September 1, 2021): 012018. http://dx.doi.org/10.1088/1742-6596/2025/1/012018.

Full text
Abstract:
Abstract The surveillance camera network covering the city, while protecting the safety of people’s lives and property, generate a large amount of surveillance video data every day, but few videos contain useful information. Surveillance cameras in sparsely crowded areas may capture most of the video in the background. A large number of surveillance videos bring greater storage pressure, and also increase the difficulty of the staff’s work. In this paper, we use the auto-encoder network to extract features of video frames. By comparing the feature differences between video frames, we can automatically select key video clips and key frame images that contain useful information to slim down the surveillance video. Through experiments on actual cameras, we found that this method can achieve the purpose of reducing the pressure of storage and traversal.
APA, Harvard, Vancouver, ISO, and other styles
12

Wang, Peipei, Yun Cao, and Xianfeng Zhao. "Segmentation Based Video Steganalysis to Detect Motion Vector Modification." Security and Communication Networks 2017 (2017): 1–12. http://dx.doi.org/10.1155/2017/8051389.

Full text
Abstract:
This paper presents a steganalytic approach against video steganography which modifies motion vector (MV) in content adaptive manner. Current video steganalytic schemes extract features from fixed-length frames of the whole video and do not take advantage of the content diversity. Consequently, the effectiveness of the steganalytic feature is influenced by video content and the problem of cover source mismatch also affects the steganalytic performance. The goal of this paper is to propose a steganalytic method which can suppress the differences of statistical characteristics caused by video content. The given video is segmented to subsequences according to block’s motion in every frame. The steganalytic features extracted from each category of subsequences with close motion intensity are used to build one classifier. The final steganalytic result can be obtained by fusing the results of weighted classifiers. The experimental results have demonstrated that our method can effectively improve the performance of video steganalysis, especially for videos of low bitrate and low embedding ratio.
APA, Harvard, Vancouver, ISO, and other styles
13

Tang, Zhenjun, Shaopeng Zhang, Zhenhai Chen, and Xianquan Zhang. "Robust Video Hashing Based on Multidimensional Scaling and Ordinal Measures." Security and Communication Networks 2021 (April 30, 2021): 1–11. http://dx.doi.org/10.1155/2021/9930673.

Full text
Abstract:
Multimedia hashing is a useful technology of multimedia management, e.g., multimedia search and multimedia security. This paper proposes a robust multimedia hashing for processing videos. The proposed video hashing constructs a high-dimensional matrix via gradient features in the discrete wavelet transform (DWT) domain of preprocessed video, learns low-dimensional features from high-dimensional matrix via multidimensional scaling, and calculates video hash by ordinal measures of the learned low-dimensional features. Extensive experiments on 8300 videos are performed to examine the proposed video hashing. Performance comparisons reveal that the proposed scheme is better than several state-of-the-art schemes in balancing the performances of robustness and discrimination.
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Bin, Yu Liu, Wei Wang, Wei Xu, and Mao Jun Zhang. "Local Spatiotemporal Coding and Sparse Representation Based Human Action Recognition." Applied Mechanics and Materials 401-403 (September 2013): 1555–60. http://dx.doi.org/10.4028/www.scientific.net/amm.401-403.1555.

Full text
Abstract:
To handle with the limitation of bag-of-features (BoF) model which ignores spatial and temporal relationships of local features in human action recognition in video, a Local Spatiotemporal Coding (LSC) is proposed. Rather than the exiting methods only uses the feature appearance information for coding, LSC encodes feature appearance and spatiotemporal positions information simultaneously with vector quantization (VQ). It can directly models the spatiotemporal relationships of local features in space time volume (STV). In implement, the local features are projected into sub-space-time-volume (sub-STV), and encoded with LSC. In addition a multi-level LSC is also provided. Then a group of sub-STV descriptors obtained from videos with multi-level LSC and Avg-pooling are used for action video classification. A sparse representation based classification method is adopted to classify action videos upon these sub-STV descriptors. The experimental results on KTH, Weizmann, and UCF sports datasets show that our method achieves better performance than the previous local spatiotemporal features based human action recognition methods.
APA, Harvard, Vancouver, ISO, and other styles
15

Bekhet, Saddam, and Abdullah M. Alghamdi. "A COMPARATIVE STUDY FOR VIDEO CLASSIFICATION TECHNIQUES USING DIRECT FEATURES MATCHING, MACHINE LEARNING, AND DEEP LEARNING." Journal of Southwest Jiaotong University 56, no. 4 (August 30, 2021): 745–57. http://dx.doi.org/10.35741/issn.0258-2724.56.4.63.

Full text
Abstract:
Videos are considered the new era communication language between internet users due to the explosion of smart-phones usage and increase in internet bandwidth and storage space. This has fueled the need to develop robust video analysis techniques. Specifically, video classification presents a unique task for field researchers, as it has numerous critical applications, such as video indexing, searching, annotation and surveillance. Videos inherently embody static and dynamic information that is encoded in frames. The task is further prioritized due to the gigantic amounts of available videos in the digital world, which requires a robust way to organize these videos. Throughout literature, researchers have generally adopted three main techniques to classify videos, i.e., direct features matching, machine learning-based methods, and deep learning-based methods. Each of these methods is suitable for a specific application type. This paper is designed to assess which of the three common working approaches are better for video classification. Furthermore, the paper aims to examine whether and how these methods affect/improve video classification performance and key factors to constructing a robust video classification system. This novel research paper covers an important research gap by introducing a rigorous comparative analysis of the three methods highlighting their advantages and disadvantages and guiding field researchers. A comprehensive analysis brings the paper findings together using a benchmark group of challenging large-scale video datasets (~29k videos). This would provide field researchers with the necessary information to choose the best method for their video classification research work.
APA, Harvard, Vancouver, ISO, and other styles
16

Bekhet, Saddam, and Abdullah M. Alghamdi. "A COMPARATIVE STUDY FOR VIDEO CLASSIFICATION TECHNIQUES USING DIRECT FEATURES MATCHING, MACHINE LEARNING, AND DEEP LEARNING." Journal of Southwest Jiaotong University 56, no. 4 (August 30, 2021): 745–57. http://dx.doi.org/10.35741/issn.0258-2724.56.4.63.

Full text
Abstract:
Videos are considered the new era communication language between internet users due to the explosion of smart-phones usage and increase in internet bandwidth and storage space. This has fueled the need to develop robust video analysis techniques. Specifically, video classification presents a unique task for field researchers, as it has numerous critical applications, such as video indexing, searching, annotation and surveillance. Videos inherently embody static and dynamic information that is encoded in frames. The task is further prioritized due to the gigantic amounts of available videos in the digital world, which requires a robust way to organize these videos. Throughout literature, researchers have generally adopted three main techniques to classify videos, i.e., direct features matching, machine learning-based methods, and deep learning-based methods. Each of these methods is suitable for a specific application type. This paper is designed to assess which of the three common working approaches are better for video classification. Furthermore, the paper aims to examine whether and how these methods affect/improve video classification performance and key factors to constructing a robust video classification system. This novel research paper covers an important research gap by introducing a rigorous comparative analysis of the three methods highlighting their advantages and disadvantages and guiding field researchers. A comprehensive analysis brings the paper findings together using a benchmark group of challenging large-scale video datasets (~29k videos). This would provide field researchers with the necessary information to choose the best method for their video classification research work.
APA, Harvard, Vancouver, ISO, and other styles
17

Li, Yanghui, Hong Zhu, Qian Hou, Jing Wang, and Wenhuan Wu. "Video Super-Resolution Using Multi-Scale and Non-Local Feature Fusion." Electronics 11, no. 9 (May 7, 2022): 1499. http://dx.doi.org/10.3390/electronics11091499.

Full text
Abstract:
Video super-resolution can generate corresponding to high-resolution video frames from a plurality of low-resolution video frames which have rich details and temporally consistency. Most current methods use two-level structure to reconstruct video frames by combining optical flow network and super-resolution network, but this process does not deeply mine the effective information contained in video frames. Therefore, we propose a video super-resolution method that combines non-local features and multi-scale features to extract more in-depth effective information contained in video frames. Our method obtains long-distance effective information by calculating the similarity between any two pixels in the video frame through the non-local module, extracts the local information covered by different scale convolution cores through the multi-scale feature fusion module, and fully fuses feature information using different connection modes of convolution cores. Experiments on different data sets show that the proposed method is superior to the existing methods in quality and quantity.
APA, Harvard, Vancouver, ISO, and other styles
18

Liu, Shiguang, Huixin Wang, and Xiaoli Zhang. "Video Decolorization Based on the CNN and LSTM Neural Network." ACM Transactions on Multimedia Computing, Communications, and Applications 17, no. 3 (July 22, 2021): 1–18. http://dx.doi.org/10.1145/3446619.

Full text
Abstract:
Video decolorization is the process of transferring three-channel color videos into single-channel grayscale videos, which is essentially the decolorization operation of video frames. Most existing video decolorization algorithms directly apply image decolorization methods to decolorize video frames. However, if we only take the single-frame decolorization result into account, it will inevitably cause temporal inconsistency and flicker phenomenon meaning that the same local content between continuous video frames may display different gray values. In addition, there are often similar local content features between video frames, which indicates redundant information. To solve the preceding problems, this article proposes a novel video decolorization algorithm based on the convolutional neural network and the long short-term memory neural network. First, we design a local semantic content encoder to learn and extract the same local content of continuous video frames, which can better preserve the contrast of video frames. Second, a temporal feature controller based on the bi-directional recurrent neural networks with Long short-term memory units is employed to refine the local semantic features, which can greatly maintain temporal consistency of the video sequence to eliminate the flicker phenomenon. Finally, we take advantages of deconvolution to decode the features to produce the grayscale video sequence. Experiments have indicated that our method can better preserve the local contrast of video frames and the temporal consistency over the state of the-art.
APA, Harvard, Vancouver, ISO, and other styles
19

Ritwik Baranwal. "Automatic Summarization of Cricket Highlights using Audio Processing." January 2021 7, no. 01 (January 4, 2021): 48–53. http://dx.doi.org/10.46501/ijmtst070111.

Full text
Abstract:
The problem of automatic excitement detection in cricket videos is considered and applied for highlight generation. This paper focuses on detecting exciting events in video using complementary information from the audio and video domains. First, a method of audio and video elements separation is proposed. Thereafter, the “level-of-excitement” is measured using features such as amplitude, and spectral center of gravity extracted from the commentators speech’s amplitude to decide the threshold. Our experiments using actual cricket videos show that these features are well correlated with human assessment of excitability. Finally, audio/video information is fused according to time-order scenes which has “excitability” in order to generate highlights of cricket. The techniques described in this paper are generic and applicable to a variety of topic and video/acoustic domains.
APA, Harvard, Vancouver, ISO, and other styles
20

Jia, Xi Bin, and Mei Xia Zheng. "Video Based Visual Speech Feature Model Construction." Applied Mechanics and Materials 182-183 (June 2012): 1367–71. http://dx.doi.org/10.4028/www.scientific.net/amm.182-183.1367.

Full text
Abstract:
This paper aims to give a solutions for the construction of chinese visual speech feature model based on HMM. We propose and discuss three kind representation model of the visual speech which are lip geometrical features, lip motion features and lip texture features. The model combines the advantages of the local LBP and global DCT texture information together, which shows better performance than the single feature. Equally the model combines the advantages of the local LBP and geometrical information together is better than single feature. By computing the recognition rate of the visemes from the model, the paper shows the HMM which describing the dynamic of speech, coupled with the combined feature for describing the global and local texture is the best model.
APA, Harvard, Vancouver, ISO, and other styles
21

Vashchuk, O. "IDENTIFICATION PROCESS PERSON IN VIDEO RECORDING." Criminalistics and Forensics, no. 66 (2021): 645–56. http://dx.doi.org/10.33994/kndise.2021.66.48.

Full text
Abstract:
The article is devoted to the characteristics of certain components of the process of identification of a person by the features of nonverbal information coming from his appearance in the video. The purpose of this study is to determine the identity of the features of non-verbal information objects (in statics) and its properties (in dynamics) in videos. It is stated that the research methods are system analysis, comparative analysis, synthesis, one-dimensional comparison, multidimensional comparison and its author’s modifications depending on the objects of a particular study. It is stated that the objects of research in cases may be sources of nonverbal information in the video, which indicate the individual characteristics of the person and the features by which he can be identified. The subject of the study correlates with the object of the study and is determined by the sources of nonverbal information in the video. Thus, the subject of research in cases is nonverbal information in the video, which indicates the individual characteristics and properties of the person and the features by which he can be identified. The process of research of nonverbal information objects in the video is revealed: it is a stage (preliminary, separate and comparative research; summarizing). It has been determined that samples in the study of objects of nonverbal information in video can be free and experimental and must meet clearly defined requirements. Preliminary research of nonverbal information objects includes preparatory actions, and separate research – a separate study of identification features and properties of comparative objects. The process of comparative research involves the accumulation of data and results from previous stages and a comparative study of the identification features and properties of objects. The process of summarizing the results includes evaluation of research results, formulation of conclusions and proper design of expert research of the whole process of research of a person on the features of nonverbal information coming from his appearance in the video. The prospect of further development on creation of a technique of complex expert research of the person on materials of video recording at the level of the independent certified technique included in the Register of methods of carrying out forensic examinations is announced.
APA, Harvard, Vancouver, ISO, and other styles
22

Ye, Fanghong, Tinghua Ai, Jiaming Wang, Yuan Yao, and Zheng Zhou. "A Method for Classifying Complex Features in Urban Areas Using Video Satellite Remote Sensing Data." Remote Sensing 14, no. 10 (May 11, 2022): 2324. http://dx.doi.org/10.3390/rs14102324.

Full text
Abstract:
The classification of optical satellite-derived remote sensing images is an important satellite remote sensing application. Due to the wide variety of artificial features and complex ground situations in urban areas, the classification of complex urban features has always been a focus of and challenge in the field of remote sensing image classification. Given the limited information that can be obtained from traditional optical satellite-derived remote sensing data of a classification area, it is difficult to classify artificial features in detail at the pixel level. With the development of technologies, such as satellite platforms and sensors, the data types acquired by remote sensing satellites have evolved from static images to dynamic videos. Compared with traditional satellite-derived images, satellite-derived videos contain increased ground object reflection information, especially information obtained from different observation angles, and can thus provide more information for classifying complex urban features and improving the corresponding classification accuracies. In this paper, first, we analyze urban-area, ground feature characteristics and satellite-derived video remote sensing data. Second, according to these characteristics, we design a pixel-level classification method based on the application of machine learning techniques to video remote sensing data that represents complex, urban-area ground features. Last, we conduct experiments on real data. The test results show that applying the method designed in this paper to classify dynamic, satellite-derived video remote sensing data can improve the classification accuracy of complex features in urban areas compared with the classification results obtained using static, satellite-derived remote sensing image data at the same resolution.
APA, Harvard, Vancouver, ISO, and other styles
23

WuJian and WuBozhu. "Objectionable Internet Video Information Detection Based on Color Features." International Journal of Digital Content Technology and its Applications 7, no. 6 (March 31, 2013): 488–96. http://dx.doi.org/10.4156/jdcta.vol7.issue6.55.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Wang, Yunchen. "Research on Dance Movement Recognition Based on Multi-Source Information." Mathematical Problems in Engineering 2022 (April 23, 2022): 1–10. http://dx.doi.org/10.1155/2022/5257165.

Full text
Abstract:
A huge number of scientific research institutions and scholars are now researching this topic in depth, with promising results. Meanwhile, research development in dance visual frequency movement detection is rather modest due to the high complexity of dance movement and the challenges of human body self-shielding in dance performance. Aiming at the problem of the combination of motion recognition and dance video, the feature extraction, representation, and motion recognition methods based on dance video are emphatically studied. This paper studies an effective feature extraction method according to the characteristics of dance movements. Firstly, each dance movement video in the data set is separated into equal sections, and the edge characteristics of all video pictures in each segment are gathered into one image, from which the direction gradient histogram features are extracted. Secondly, a group of directional gradient histogram feature vectors is used to represent the local appearance information and shape features of the video dance moves. In view of the existing problem of heterogeneous feature fusion, this paper chooses the multi-core learning method to fuse the three kinds of features for dance movement recognition. Finally, the effectiveness of the proposed dance movement detection algorithm is tested using the Dance DB data set from the University of Cyprus and the Folk Dance data set from my laboratory. Experimental results show that the proposed algorithm can maintain a certain recognition rate for relatively complex dance movements and can still ensure a certain accuracy when the background and target are easily confused. This also confirms the efficacy of the movement recognition system used in this paper for recognizing dance movements.
APA, Harvard, Vancouver, ISO, and other styles
25

Yu, Jia Xiang. "Information System Design and its Application of Badminton Video." Applied Mechanics and Materials 738-739 (March 2015): 635–38. http://dx.doi.org/10.4028/www.scientific.net/amm.738-739.635.

Full text
Abstract:
In this paper, we mainly study on the relatively small moving target and its use in the badminton video, and we also study the relatively mining methods open space across the network sparring sport badminton video. Accurate extraction of a moving object based on the mining model is using to modified geometric features for the basic motion feature mining method to be analyzed, we than obtain video structural information based on statistics of people habits using the improved tracking algorithm, the results show that the method can solve the problem of missing and made a badminton video teaching system design.
APA, Harvard, Vancouver, ISO, and other styles
26

Mu, Xiangming. "Semantic visual features in content-based video retrieval." Proceedings of the American Society for Information Science and Technology 43, no. 1 (October 10, 2007): 1–14. http://dx.doi.org/10.1002/meet.1450430153.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Livingston, Merlin L. M., and Agnel L. G. X. Livingston. "Processing of Images and Videos for Extracting Text Information from Clustered Features Using Graph Wavelet Transform." Journal of Computational and Theoretical Nanoscience 16, no. 2 (February 1, 2019): 557–61. http://dx.doi.org/10.1166/jctn.2019.7768.

Full text
Abstract:
Image processing is an interesting domain for extracting knowledge from real time video and images for surveillance, automation, robotics, medical and entertainment industries. The data obtained from videos and images are continuous and hold a primary role in semantic based video analysis, retrieval and indexing. When images and videos are obtained from natural and random sources, they need to be processed for identifying text, tracking, binarization and recognising meaningful information for succeeding actions. This proposal defines a solution with assistance of Spectral Graph Wave Transform (SGWT) technique for localizing and extracting text information from images and videos. K Means clustering technique precedes the SGWT process to group features in an image from a quantifying Hill Climbing algorithm. Precision, Sensitivity, Specificity and Accuracy are the four parameters which declares the efficiency of proposed technique. Experimentation is done from training sets from ICDAR and YVT for videos.
APA, Harvard, Vancouver, ISO, and other styles
28

Ou, Hongshi, and Jifeng Sun. "The Multidimensional Motion Features of Spatial Depth Feature Maps: An Effective Motion Information Representation Method for Video-Based Action Recognition." Mathematical Problems in Engineering 2021 (January 28, 2021): 1–15. http://dx.doi.org/10.1155/2021/6670087.

Full text
Abstract:
In video action recognition based on deep learning, the design of the neural network is focused on how to acquire effective spatial information and motion information quickly. This paper proposes a kind of deep network that can obtain both spatial information and motion information in video classification. It is called MDFs (the multidimensional motion features of deep feature map net). This method can be used to obtain spatial information and motion information in videos only by importing image frame data into a neural network. MDFs originate from the definition of 3D convolution. Multiple 3D convolution kernels with different information focuses are used to act on depth feature maps so as to obtain effective motion information at both spatial and temporal. On the other hand, we split the 3D convolution at space dimension and time dimension, and the spatial network feature map has reduced the dimensions of the original frame image data, which realizes the mitigation of computing resources of the multichannel grouped 3D convolutional network. In order to realize the region weight differentiation of spatial features, a spatial feature weighted pooling layer based on the spatial-temporal motion information guide is introduced to realize the attention to high recognition information. By means of multilevel LSTM, we realize the fusion between global semantic information acquisition and depth features at different levels so that the fully connected layers with rich classification information can provide frame attention mechanism for the spatial information layer. MDFs need only to act on RGB images. Through experiments on three universal experimental datasets of action recognition, UCF10, UCF11, and HMDB51, it is concluded that the MDF network can achieve an accuracy comparable to two streams (RGB and optical flow) that requires the import of both frame data and optical flow data in video classification tasks.
APA, Harvard, Vancouver, ISO, and other styles
29

He, Changyang, Lu He, Tun Lu, and Bo Li. "Beyond Entertainment: Unpacking Danmaku and Comments' Role of Information Sharing and Sentiment Expression in Online Crisis Videos." Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (October 13, 2021): 1–27. http://dx.doi.org/10.1145/3479555.

Full text
Abstract:
Online videos are playing an increasingly important role in timely information dissemination especially during public crises. Video commentary, synchronous or asynchronous, is indispensable in viewers' engagement and participation, and may in turn contribute to video with additional information and emotions. Yet, the roles of video commentary in crisis communications are largely unexplored, which we believe that an investigation not only provides timely feedback but also offers concrete guidelines for better information dissemination. In this work, we study two distinct commentary features of online videos: traditional asynchronous comments and emerging synchronous danmaku. We investigate how users utilize these two features to express their emotions and share information during a public health crisis. Through qualitative analysis and applying machine learning techniques on a large-scale danmaku and comment dataset of Chinese COVID-19-related videos, we uncover the distinctive roles of danmaku and comments in crisis communication, and propose comprehensive taxonomies for information themes and emotion categories of commentary. We also discover the unique patterns of crisis communications presented by danmaku, such as collective emotional resonance and style-based highlighting for emphasizing critical information. Our study captures the unique values and salient features of the emerging commentary interfaces, in particular danmaku, in the context of crisis videos, and further provides several design implications to enable more effective communications through online videos to engage and empower users during crises.
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Chen, Bin Hu, Yucong Suo, Zhiqiang Zou, and Yimu Ji. "Large-Scale Video Retrieval via Deep Local Convolutional Features." Advances in Multimedia 2020 (June 9, 2020): 1–8. http://dx.doi.org/10.1155/2020/7862894.

Full text
Abstract:
In this paper, we study the challenge of image-to-video retrieval, which uses the query image to search relevant frames from a large collection of videos. A novel framework based on convolutional neural networks (CNNs) is proposed to perform large-scale video retrieval with low storage cost and high search efficiency. Our framework consists of the key-frame extraction algorithm and the feature aggregation strategy. Specifically, the key-frame extraction algorithm takes advantage of the clustering idea so that redundant information is removed in video data and storage cost is greatly reduced. The feature aggregation strategy adopts average pooling to encode deep local convolutional features followed by coarse-to-fine retrieval, which allows rapid retrieval in the large-scale video database. The results from extensive experiments on two publicly available datasets demonstrate that the proposed method achieves superior efficiency as well as accuracy over other state-of-the-art visual search methods.
APA, Harvard, Vancouver, ISO, and other styles
31

Varga, Domonkos. "No-Reference Video Quality Assessment Based on Benford’s Law and Perceptual Features." Electronics 10, no. 22 (November 12, 2021): 2768. http://dx.doi.org/10.3390/electronics10222768.

Full text
Abstract:
No-reference video quality assessment (NR-VQA) has piqued the scientific community’s interest throughout the last few decades, owing to its importance in human-centered interfaces. The goal of NR-VQA is to predict the perceptual quality of digital videos without any information about their distortion-free counterparts. Over the past few decades, NR-VQA has become a very popular research topic due to the spread of multimedia content and video databases. For successful video quality evaluation, creating an effective video representation from the original video is a crucial step. In this paper, we propose a powerful feature vector for NR-VQA inspired by Benford’s law. Specifically, it is demonstrated that first-digit distributions extracted from different transform domains of the video volume data are quality-aware features and can be effectively mapped onto perceptual quality scores. Extensive experiments were carried out on two large, authentically distorted VQA benchmark databases.
APA, Harvard, Vancouver, ISO, and other styles
32

Zhao, Hong, Lan Guo, ZhiWen Chen, and HouZe Zheng. "Research on Video Captioning Based on Multifeature Fusion." Computational Intelligence and Neuroscience 2022 (April 28, 2022): 1–14. http://dx.doi.org/10.1155/2022/1204909.

Full text
Abstract:
Aiming at the problems that the existing video captioning models pay attention to incomplete information and the generation of expression text is not accurate enough, a video captioning model that integrates image, audio, and motion optical flow is proposed. A variety of large-scale dataset pretraining models are used to extract video frame features, motion information, audio features, and video sequence features. An embedded layer structure based on self-attention mechanism is designed to embed single-mode features and learn single-mode feature parameters. Then, two schemes of joint representation and cooperative representation are used to fuse the multimodal features of the feature vectors output by the embedded layer, so that the model can pay attention to different targets in the video and their interactive relationships, which effectively improves the performance of the video captioning model. The experiment is carried out on large datasets MSR-VTT and LSMDC. Under the metrics BLEU4, METEOR, ROUGEL, and CIDEr, the MSR-VTT benchmark dataset obtained scores of 0.443, 0.327, 0.619, and 0.521, respectively. The result shows that the proposed method can effectively improve the performance of the video captioning model, and the evaluation indexes are improved compared with comparison models.
APA, Harvard, Vancouver, ISO, and other styles
33

He, Fei, Naiyu Gao, Qiaozhe Li, Senyao Du, Xin Zhao, and Kaiqi Huang. "Temporal Context Enhanced Feature Aggregation for Video Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 10941–48. http://dx.doi.org/10.1609/aaai.v34i07.6727.

Full text
Abstract:
Video object detection is a challenging task because of the presence of appearance deterioration in certain video frames. One typical solution is to aggregate neighboring features to enhance per-frame appearance features. However, such a method ignores the temporal relations between the aggregated frames, which is critical for improving video recognition accuracy. To handle the appearance deterioration problem, this paper proposes a temporal context enhanced network (TCENet) to exploit temporal context information by temporal aggregation for video object detection. To handle the displacement of the objects in videos, a novel DeformAlign module is proposed to align the spatial features from frame to frame. Instead of adopting a fixed-length window fusion strategy, a temporal stride predictor is proposed to adaptively select video frames for aggregation, which facilitates exploiting variable temporal information and requiring fewer video frames for aggregation to achieve better results. Our TCENet achieves state-of-the-art performance on the ImageNet VID dataset and has a faster runtime. Without bells-and-whistles, our TCENet achieves 80.3% mAP by only aggregating 3 frames.
APA, Harvard, Vancouver, ISO, and other styles
34

Wang, Yilin, and Baokuan Chang. "Extraction of Human Motion Information from Digital Video Based on 3D Poisson Equation." Advances in Mathematical Physics 2021 (December 28, 2021): 1–11. http://dx.doi.org/10.1155/2021/1268747.

Full text
Abstract:
Based on the 3D Poisson equation, this paper extracts the features of the digital video human body action sequence. By solving the Poisson equation on the silhouette sequence, the time and space features, time and space structure features, shape features, and orientation features can be obtained. First, we use the silhouette structure features in three-dimensional space-time and the orientation features of the silhouette in three-dimensional space-time to represent the local features of the silhouette sequence and use the 3D Zernike moment feature to represent the overall features of the silhouette sequence. Secondly, we combine the Bayesian classifier and AdaBoost classifier to learn and classify the features of human action sequences, conduct experiments on the Weizmann video database, and conduct multiple experiments using the method of classifying samples and selecting partial combinations for training. Then, using the recognition algorithm of motion capture, after the above process, the three-dimensional model is obtained and matched with the model in the three-dimensional model database, the sequence with the smallest distance is calculated, and the corresponding skeleton is outputted as the results of action capture. During the experiment, the human motion tracking method based on the university matching kernel (EMK) image kernel descriptor was used; that is, the scale invariant operator was used to count the characteristics of multiple training images, and finally, the high-dimensional feature space was mapped into the low-dimensional to obtain the feature space approximating the Gaussian kernel. Based on the above analysis, the main user has prior knowledge of the network environment. The experimental results show that the method in this paper can effectively extract the characteristics of human body movements and has a good classification effect for bending, one-foot jumping, vertical jumping, waving, and other movements. Due to the linear separability of the data in the kernel space, fast linear interpolation regression is performed on the features in the feature space, which significantly improves the robustness and accuracy of the estimation of the human motion pose in the image sequence.
APA, Harvard, Vancouver, ISO, and other styles
35

Lu, Yujiang, Yaju Liu, Jianwei Fei, and Zhihua Xia. "Channel-Wise Spatiotemporal Aggregation Technology for Face Video Forensics." Security and Communication Networks 2021 (August 27, 2021): 1–13. http://dx.doi.org/10.1155/2021/5524930.

Full text
Abstract:
Recent progress in deep learning, in particular the generative models, makes it easier to synthesize sophisticated forged faces in videos, leading to severe threats on social media about personal privacy and reputation. It is therefore highly necessary to develop forensics approaches to distinguish those forged videos from the authentic. Existing works are absorbed in exploring frame-level cues but insufficient in leveraging affluent temporal information. Although some approaches identify forgeries from the perspective of motion inconsistency, there is so far not a promising spatiotemporal feature fusion strategy. Towards this end, we propose the Channel-Wise Spatiotemporal Aggregation (CWSA) module to fuse deep features of continuous video frames without any recurrent units. Our approach starts by cropping the face region with some background remained, which transforms the learning objective from manipulations to the difference between pristine and manipulated pixels. A deep convolutional neural network (CNN) with skip connections that are conducive to the preservation of detection-helpful low-level features is then utilized to extract frame-level features. The CWSA module finally makes the real or fake decision by aggregating deep features of the frame sequence. Evaluation against a list of large facial video manipulation benchmarks has illustrated its effectiveness. On all three datasets, FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Preview, the proposed approach outperforms the state-of-the-art methods with significant advantages.
APA, Harvard, Vancouver, ISO, and other styles
36

Wang, Xiaohan, Yu Wu, Linchao Zhu, and Yi Yang. "Symbiotic Attention with Privileged Information for Egocentric Action Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12249–56. http://dx.doi.org/10.1609/aaai.v34i07.6907.

Full text
Abstract:
Egocentric video recognition is a natural testbed for diverse interaction reasoning. Due to the large action vocabulary in egocentric video datasets, recent studies usually utilize a two-branch structure for action recognition, i.e., one branch for verb classification and the other branch for noun classification. However, correlation study between the verb and the noun branches have been largely ignored. Besides, the two branches fail to exploit local features due to the absence of position-aware attention mechanism. In this paper, we propose a novel Symbiotic Attention framework leveraging Privileged information (SAP) for egocentric video recognition. Finer position-aware object detection features can facilitate the understanding of actor's interaction with the object. We introduce these features in action recognition and regard them as privileged information. Our framework enables mutual communication among the verb branch, the noun branch, and the privileged information. This communication process not only injects local details into global features, but also exploits implicit guidance about the spatio-temporal position of an on-going action. We introduce a novel symbiotic attention (SA) to enable effective communication. It first normalizes the detection guided features on one branch to underline the action-relevant information from the other branch. SA adaptively enhances the interactions among the three sources. To further catalyze this communication, spatial relations are uncovered for the selection of most action-relevant information. It identifies the most valuable and discriminative feature for classification. We validate the effectiveness of our SAP quantitatively and qualitatively. Notably, it achieves the state-of-the-art on two large-scale egocentric video datasets.
APA, Harvard, Vancouver, ISO, and other styles
37

Li, Jiayue, and Yan Piao. "Video Person Re-Identification with Frame Sampling–Random Erasure and Mutual Information–Temporal Weight Aggregation." Sensors 22, no. 8 (April 15, 2022): 3047. http://dx.doi.org/10.3390/s22083047.

Full text
Abstract:
Partial occlusion and background clutter in camera video surveillance affect the accuracy of video-based person re-identification (re-ID). To address these problems, we propose a person re-ID method based on random erasure of frame sampling and temporal weight aggregation of mutual information of partial and global features. First, for the case in which the target person is interfered or partially occluded, the frame sampling–random erasure (FSE) method is used for data enhancement to effectively alleviate the occlusion problem, improve the generalization ability of the model, and match persons more accurately. Second, to further improve the re-ID accuracy of video-based persons and learn more discriminative feature representations, we use a ResNet-50 network to extract global and partial features and fuse these features to obtain frame-level features. In the time dimension, based on a mutual information–temporal weight aggregation (MI–TWA) module, the partial features are added according to different weights and the global features are added according to equal weights and connected to output sequence features. The proposed method is extensively experimented on three public video datasets, MARS, DukeMTMC-VideoReID, and PRID-2011; the mean average precision (mAP) values are 82.4%, 94.1%, and 95.3% and Rank-1 values are 86.4%, 94.8%, and 95.2%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
38

Ekaterina K., Reva, and Mezhina Victoria A. "Features of Providing Content on YouTube about Extreme Situations: Genres and Formats." Humanitarian Vector 15, no. 5 (May 2020): 110–15. http://dx.doi.org/10.21209/1996-7853-2020-15-5-110-115.

Full text
Abstract:
The relevance of the study of YouTube is due to the fact that today video hosting is a concentration of video materials of various thematic areas and channels producing content. This article presents the results of a study of the genres and formats used on the media platform YouTube to reflect information about extreme situations of a natural and man-made nature. Video hosting YouTube has become a platform not only for the distribution of user content, but also a modern communication channel for broadcasting the content of traditional media. It is this factor that allows you to identify genre and format ways of providing information. To compile a list of extreme and emergency situations for 2015–2019, State reports on the state of protection of the population and territory of the Russian Federation from natural and man-made emergencies posted on the website of the Russian Emergencies Ministry were studied. Then, keywords were searched on the video hosting site YouTube. Due to the fact that the study is part of a major scientific project aimed at studying the influence of psychotraumatizing factors on the speech function of a person in real extreme situations, one of the selection criteria was the presence of an emotional component in the video materials. The chronological framework of the study in five years, a wide range of YouTube channels, a body of video materials collected by a continuous sampling method, the use of typological and genre analysis made it possible to obtain representative conclusions about the genre-format specifics of video materials on socially significant topics.The results obtained show that the largest number of videos belong to the information group of genres. The analytical ones include the genre of journalistic investigation. Amateur YouTube channels use selfie shooting, shooting with voiceover commentary, a conversation with an eyewitness or victim, etc. Keywords: media, television, media, YouTube, video hosting, video content, extreme situations
APA, Harvard, Vancouver, ISO, and other styles
39

Liu, Ting, Chengqing Zhang, and Liming Wang. "Integrated Multiscale Appearance Features and Motion Information Prediction Network for Anomaly Detection." Computational Intelligence and Neuroscience 2021 (October 20, 2021): 1–13. http://dx.doi.org/10.1155/2021/6789956.

Full text
Abstract:
The rise of video-prediction algorithms has largely promoted the development of anomaly detection in video surveillance for smart cities and public security. However, most current methods relied on single-scale information to extract appearance (spatial) features and lacked motion (temporal) continuity between video frames. This can cause a loss of partial spatiotemporal information that has great potential to predict future frames, affecting the accuracy of abnormality detection. Thus, we propose a novel prediction network to improve the performance of anomaly detection. Due to the objects of various scales in each video, we use different receptive fields to extract detailed appearance features by the hybrid dilated convolution (HDC) module. Meanwhile, the deeper bidirectional convolutional long short-term memory (DB-ConvLSTM) module can remember the motion information between consecutive frames. Furthermore, we use RGB difference loss to replace optical flow loss as temporal constraint, which greatly reduces the time for optical flow extraction. Compared with the state-of-the-art methods in the anomaly-detection task, experiments prove that our method can more accurately detect abnormalities in various video surveillance scenes.
APA, Harvard, Vancouver, ISO, and other styles
40

Whiteway, Matthew R., Dan Biderman, Yoni Friedman, Mario Dipoppa, E. Kelly Buchanan, Anqi Wu, John Zhou, et al. "Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders." PLOS Computational Biology 17, no. 9 (September 22, 2021): e1009439. http://dx.doi.org/10.1371/journal.pcbi.1009439.

Full text
Abstract:
Recent neuroscience studies demonstrate that a deeper understanding of brain function requires a deeper understanding of behavior. Detailed behavioral measurements are now often collected using video cameras, resulting in an increased need for computer vision algorithms that extract useful information from video data. Here we introduce a new video analysis tool that combines the output of supervised pose estimation algorithms (e.g. DeepLabCut) with unsupervised dimensionality reduction methods to produce interpretable, low-dimensional representations of behavioral videos that extract more information than pose estimates alone. We demonstrate this tool by extracting interpretable behavioral features from videos of three different head-fixed mouse preparations, as well as a freely moving mouse in an open field arena, and show how these interpretable features can facilitate downstream behavioral and neural analyses. We also show how the behavioral features produced by our model improve the precision and interpretation of these downstream analyses compared to using the outputs of either fully supervised or fully unsupervised methods alone.
APA, Harvard, Vancouver, ISO, and other styles
41

Abellan-Abenza, Javier, Alberto Garcia-Garcia, Sergiu Oprea, David Ivorra-Piqueres, and Jose Garcia-Rodriguez. "Classifying Behaviours in Videos with Recurrent Neural Networks." International Journal of Computer Vision and Image Processing 7, no. 4 (October 2017): 1–15. http://dx.doi.org/10.4018/ijcvip.2017100101.

Full text
Abstract:
This article describes how the human activity recognition in videos is a very attractive topic among researchers due to vast possible applications. This article considers the analysis of behaviors and activities in videos obtained with low-cost RGB cameras. To do this, a system is developed where a video is input, and produces as output the possible activities happening in the video. This information could be used in many applications such as video surveillance, disabled person assistance, as a home assistant, employee monitoring, etc. The developed system makes use of the successful techniques of Deep Learning. In particular, convolutional neural networks are used to detect features in the video images, meanwhile Recurrent Neural Networks are used to analyze these features and predict the possible activity in the video.
APA, Harvard, Vancouver, ISO, and other styles
42

Wang, Zhiyuan, Chongyuan Bi, Songhui You, and Junjie Yao. "Hidden Markov Model-Based Video Recognition for Sports." Advances in Mathematical Physics 2021 (December 20, 2021): 1–12. http://dx.doi.org/10.1155/2021/5183088.

Full text
Abstract:
In this paper, we conduct an in-depth study and analysis of sports video recognition by improved hidden Markov model. The feature module is a complex gesture recognition module based on hidden Markov model gesture features, which applies the hidden Markov model features to gesture recognition and performs the recognition of complex gestures made by combining simple gestures based on simple gesture recognition. The combination of the two modules forms the overall technology of this paper, which can be applied to many scenarios, including some special scenarios with high-security levels that require real-time feedback and some public indoor scenarios, which can achieve different prevention and services for different age groups. With the increase of the depth of the feature extraction network, the experimental effect is enhanced; however, the two-dimensional convolutional neural network loses temporal information when extracting features, so the three-dimensional convolutional network is used in this paper to extract features from the video in time and space. Multiple binary classifications of the extracted features are performed to achieve the goal of multilabel classification. A multistream residual neural network is used to extract features from video data of three modalities, and the extracted feature vectors are fed into the attention mechanism network, then, the more critical information for video recognition is selected from a large amount of spatiotemporal information, further learning the temporal dependencies existing between consecutive video frames, and finally fusing the multistream network outputs to obtain the final prediction category. By training and optimizing the model in an end-to-end manner, recognition accuracies of 92.7% and 64.4% are achieved on the dataset, respectively.
APA, Harvard, Vancouver, ISO, and other styles
43

Phan, Dinh-Duy, Quang-Huy Nguyen, Thanh-Thien Nguyen, Hoang-Loc Tran, and Duc-Lung Vu. "Joint inter-intra representation learning for pornographic video classification." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 3 (March 1, 2022): 1481. http://dx.doi.org/10.11591/ijeecs.v25.i3.pp1481-1488.

Full text
Abstract:
This paper addresses video inter-intra similarity retrieval for pornographic classification. The main approaching method is obtaining the internal representation and external similarity between a single unlabeled video and batches of labeled videos, then combining together to determine its label. For the internal representation, we extracted inner features within frames and clustered them to find the representative centroid as the intra-feature. For the external similarity, we utilized a similarity video learning named ViSiL to calculate distance score between two videos using chamfer similarity. With distance scores between input video and batches of pornographic/nonpornographic videos, the inter feature of the input video is obtained. Finally, the inter similarity vector and the intra representation are then concatenated together and fed to a final classifier to identify whether the video is for adults or not. In experiment, our method performs 96.88% accuracy on NPDI-2k, achieved a comparative result comparing to other state-of-the-art methods on the pornographic classification problem.
APA, Harvard, Vancouver, ISO, and other styles
44

Li, Yachao, Mengfei Guan, Paige Hammond, and Lane E. Berrey. "Communicating COVID-19 information on TikTok: a content analysis of TikTok videos from official accounts featured in the COVID-19 information hub." Health Education Research 36, no. 3 (March 1, 2021): 261–71. http://dx.doi.org/10.1093/her/cyab010.

Full text
Abstract:
Abstract Amid the COVID-19 pandemic, TikTok, an emerging social media platform, has created an information hub to provide users with engaging and authoritative COVID-19 information. This study investigates the video format, type and content of the COVID-19 TikTok videos, and how those video attributes are related to quantitative indicators of user engagement, including numbers of views, likes, comments and shares. A content analysis examined 331 videos from official accounts featured in the COVID-19 information hub. As of 5 May 2020, the videos received 907 930 000 views, 29 640 000 likes, 168 880 comments and 781 862 shares. About one in three videos had subtitles, which were positively related to the number of shares. Almost every video included a hashtag, and a higher number of hashtags was related to more likes. Video types included acting, animated infographic, documentary, news, oral speech, pictorial slideshow and TikTok dance. Dance videos had the most shares. Video themes included anti-stigma/anti-rumor, disease knowledge, encouragement, personal precautions, recognition, societal crisis management and work report. Videos conveying alarm/concern emotions, COVID-19 susceptibility and severity, precaution response efficacy had higher user engagement. Public health agencies should be aware of the opportunity of TikTok in health communication and create audience-centered risk communication to engage and inform community members.
APA, Harvard, Vancouver, ISO, and other styles
45

Lei, Zhou, and Yiyong Huang. "Video Captioning Based on Channel Soft Attention and Semantic Reconstructor." Future Internet 13, no. 2 (February 23, 2021): 55. http://dx.doi.org/10.3390/fi13020055.

Full text
Abstract:
Video captioning is a popular task which automatically generates a natural-language sentence to describe video content. Previous video captioning works mainly use the encoder–decoder framework and exploit special techniques such as attention mechanisms to improve the quality of generated sentences. In addition, most attention mechanisms focus on global features and spatial features. However, global features are usually fully connected features. Recurrent convolution networks (RCNs) receive 3-dimensional features as input at each time step, but the temporal structure of each channel at each time step has been ignored, which provide temporal relation information of each channel. In this paper, a video captioning model based on channel soft attention and semantic reconstructor is proposed, which considers the global information for each channel. In a video feature map sequence, the same channel of every time step is generated by the same convolutional kernel. We selectively collect the features generated by each convolutional kernel and then input the weighted sum of each channel to RCN at each time step to encode video representation. Furthermore, a semantic reconstructor is proposed to rebuild semantic vectors to ensure the integrity of semantic information in the training process, which takes advantage of both forward (semantic to sentence) and backward (sentence to semantic) flows. Experimental results on popular datasets MSVD and MSR-VTT demonstrate the effectiveness and feasibility of our model.
APA, Harvard, Vancouver, ISO, and other styles
46

Han, Zhisong, Yaling Liang, Zengqun Chen, and Zhiheng Zhou. "A two-stream network with joint spatial-temporal distance for video-based person re-identification." Journal of Intelligent & Fuzzy Systems 39, no. 3 (October 7, 2020): 3769–81. http://dx.doi.org/10.3233/jifs-192067.

Full text
Abstract:
Video-based person re-identification aims to match videos of pedestrians captured by non-overlapping cameras. Video provides spatial information and temporal information. However, most existing methods do not combine these two types of information well and ignore that they are of different importance in most cases. To address the above issues, we propose a two-stream network with a joint distance metric for measuring the similarity of two videos. The proposed two-stream network has several appealing properties. First, the spatial stream focuses on multiple parts of a person and outputs robust local spatial features. Second, a lightweight and effective temporal information extraction block is introduced in video-based person re-identification. In the inference stage, the distance of two videos is measured by the weighted sum of spatial distance and temporal distance. We conduct extensive experiments on four public datasets, i.e., MARS, PRID2011, iLIDS-VID and DukeMTMC-VideoReID to show that our proposed approach outperforms existing methods in video-based person re-ID.
APA, Harvard, Vancouver, ISO, and other styles
47

Dang, Zijun, Shunshun Liu, Tong Li, and Liang Gao. "Analysis of Stadium Operation Risk Warning Model Based on Deep Confidence Neural Network Algorithm." Computational Intelligence and Neuroscience 2021 (July 5, 2021): 1–10. http://dx.doi.org/10.1155/2021/3715116.

Full text
Abstract:
In this paper, a deep confidence neural network algorithm is used to design and deeply analyze the risk warning model for stadium operation. Many factors, such as video shooting angle, background brightness, diversity of features, and the relationship between human behaviors, make feature attribute-based behavior detection a focus of researchers’ attention. To address these factors, researchers have proposed a method to extract human behavior skeleton and optical flow feature information from videos. The key of the deep confidence neural network-based recognition method is the extraction of the human skeleton, which extracts the skeleton sequence of human behavior from a surveillance video, where each frame of the skeleton contains 18 joints of the human skeleton and the confidence value estimated for each frame of the skeleton, and builds a deep confidence neural network model to classify the dangerous behavior based on the obtained skeleton feature information combined with the time vector in the skeleton sequence and determine the danger level of the behavior by setting the corresponding threshold value. The deep confidence neural network uses different feature information compared with the spatiotemporal graph convolutional network. The deep confidence neural network establishes the deep confidence neural network model based on the human optical flow information, combined with the temporal relational inference of video frames. The key of the temporal relationship network-based recognition method is to extract some frames from the video in an orderly or random way into the temporal relationship network. In this paper, we use several methods for comparison experiments, and the results show that the recognition method based on skeleton and optical flow features is significantly better than the algorithm of manual feature extraction.
APA, Harvard, Vancouver, ISO, and other styles
48

Munawaroh, Siti. "Teaching the Narrative Texts Using Animation Video: Raising Students’ Skills on Reading Comprehension." Utamax : Journal of Ultimate Research and Trends in Education 1, no. 1 (July 13, 2019): 18–22. http://dx.doi.org/10.31849/utamax.v1i1.2791.

Full text
Abstract:
This study aims to answer the problem of whether animated videos improve reading skills in understanding language features, find real and detailed information, and take moral values in narrative texts in the classroom. The author uses classroom action research. The tools used to collect data are observation sheets, field notes, and tests. In cycle one to cycle three, the author uses animated videos as a medium to improve reading comprehension in understanding language features, finding real and detailed information, and taking moral values in narrative texts. The author uses animated videos with the activity of showing printed images of animated video stories and giving unknown words to improve students' ability to understand the narrative text language features. The author uses animated videos with predictive activities, confirmations, class discussions, and tests to improve students' ability to find real and detailed information. The author uses animated videos by reviewing each character and finding real and detailed information to improve students' ability to take moral values from narrative texts.
APA, Harvard, Vancouver, ISO, and other styles
49

Lyudmyla, Volontyr. "INFORMATION REPRODUCTION SYSTEMS IN INFORMATION BUSINESS LOGISTICS." ENGINEERING, ENERGY, TRANSPORT AIC, no. 4(115) (December 24, 2021): 56–65. http://dx.doi.org/10.37128/2520-6168-2021-4-6.

Full text
Abstract:
The article considers the fundamentals of the information reproduction systems formation in the optoelectronic element base for information logistics systems. The use of optoelectronic elements for information processing has been considered, namely discrete optoelectronic digital systems, analog systems, optical memory systems, optical systems of input-output of information in computers, systems based on fiber devices of neuristor type. It is emphasized that modern logistics is impossible without the active use of information technology. The functions of information support of managerial influences can be performed by information technologies used today in logistics. To perform the tasks of financial flow management, these technologies can be supplemented by modules of eye-processing of the information. Logic-clock quantron automatic devices based on optocouplers are suitable for creating parallel information operating environments, which is a universal means of converting and presenting information. This approach leads to the creation of matrix-type devices that are able not only to receive information but also to process it. One of the promising areas of use of optoelectronic matrix systems is the creation of flat operating screens for parallel reception and display of information. The paper presents the classification of operating screens according to such features as: the principle of displaying information, the type of input information, the type of output information, the method of image formation, the number of consumers of the information. The analysis of electric circuit diagram of modern LED matrix video screens, in particular of a typesetting-modular design has been presented. A comparison of the forms of organization of matrix video screens is made, and it is emphasized that the most economical in terms of the number of memory trigger elements per one LED of the display cell is a video information system based on the structure of the third group video screen. The structure of the video information system is optimized according to the criterion of optimality – the maximum image quality on the matrix screen and the minimum screen complexity, which is determined by the circuit features of the microelectronic circuits.
APA, Harvard, Vancouver, ISO, and other styles
50

Liu, Jie, and Haiping Lv. "Recommendation of Micro Teaching Video Resources Based on Topic Mining and Sentiment Analysis." International Journal of Emerging Technologies in Learning (iJET) 17, no. 06 (March 29, 2022): 243–56. http://dx.doi.org/10.3991/ijet.v17i06.30011.

Full text
Abstract:
Video learning resources are preferred by many students, owing to their intuitiveness and attractiveness. It is of practical significance to study the recommendation methods of video learning resources. Most of the existing research methods treat the scoring matrix as the main element, failing to consider video contents and learner interests. As a result, few of them can realize precise recommendation of videos. To solve the problem, this paper explores the recommendation of micro teaching video resources based on topic mining and sentiment analysis. Firstly, the dialog text features of English dialog videos and learner interest features were mined based on the deep word vector, and a topic mining model was established to achieve similarity-based resource recommendation. Next, the micro teaching videos with text information were subjected to sentiment analysis, improving the pushing accuracy of micro teaching videos. Finally, the scientific nature of our algorithm was demonstrated through experiments.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography