To see the other types of publications on this topic, follow the link: MULTI VIEW VIDEOS.

Journal articles on the topic 'MULTI VIEW VIDEOS'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'MULTI VIEW VIDEOS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Luo, Lei, Rong Xin Jiang, Xiang Tian, and Yao Wu Chen. "Reference Viewpoints Selection for Multi-View Video Plus Depth Coding Based on the Network Bandwidth Constraint." Applied Mechanics and Materials 303-306 (February 2013): 2134–38. http://dx.doi.org/10.4028/www.scientific.net/amm.303-306.2134.

Full text
Abstract:
In multi-view video plus depth (MVD) coding based free viewpoint video applications, a few reference viewpoints’ texture and depth videos should be compressed and transmitted at the server side. At the terminal side, the display view videos could be the decoded reference view videos or the virtual viewpoints’ videos which are synthesized by DIBR technology. The entire video quality of all display views are decided by the number of reference viewpoints and the compression distortion of each reference viewpoint’s texture and depth videos. This paper studies the impact of the reference viewpoints selection on the entire video quality of all display views. The results show that depending on the available network bandwidth, the MVD coding requires different selections of reference viewpoints to maximize the entire video quality of all display views.
APA, Harvard, Vancouver, ISO, and other styles
2

Chen, Jiawei, Zhenshi Zhang, and Xupeng Wen. "Target Identification via Multi-View Multi-Task Joint Sparse Representation." Applied Sciences 12, no. 21 (October 28, 2022): 10955. http://dx.doi.org/10.3390/app122110955.

Full text
Abstract:
Recently, the monitoring efficiency and accuracy of visible and infrared video have been relatively low. In this paper, we propose an automatic target identification method using surveillance video, which provides an effective solution for the surveillance video data. Specifically, a target identification method via multi-view and multi-task sparse learning is proposed, where multi-view includes various types of visual features such as textures, edges, and invariant features. Each view of a candidate is regarded as a template, and the potential relationship between different tasks and different views is considered. These multiple views are integrated into the multi-task spare learning framework. The proposed MVMT method can be applied to solve the ship’s identification. Extensive experiments are conducted on public datasets, and custom sequence frames (i.e., six sequence frames from ship videos). The experimental results show that the proposed method is superior to other classical methods, qualitatively and quantitatively.
APA, Harvard, Vancouver, ISO, and other styles
3

Zhong, Chengzhang, Amy R. Reibman, Hansel A. Mina, and Amanda J. Deering. "Multi-View Hand-Hygiene Recognition for Food Safety." Journal of Imaging 6, no. 11 (November 7, 2020): 120. http://dx.doi.org/10.3390/jimaging6110120.

Full text
Abstract:
A majority of foodborne illnesses result from inappropriate food handling practices. One proven practice to reduce pathogens is to perform effective hand-hygiene before all stages of food handling. In this paper, we design a multi-camera system that uses video analytics to recognize hand-hygiene actions, with the goal of improving hand-hygiene effectiveness. Our proposed two-stage system processes untrimmed video from both egocentric and third-person cameras. In the first stage, a low-cost coarse classifier efficiently localizes the hand-hygiene period; in the second stage, more complex refinement classifiers recognize seven specific actions within the hand-hygiene period. We demonstrate that our two-stage system has significantly lower computational requirements without a loss of recognition accuracy. Specifically, the computationally complex refinement classifiers process less than 68% of the untrimmed videos, and we anticipate further computational gains in videos that contain a larger fraction of non-hygiene actions. Our results demonstrate that a carefully designed video action recognition system can play an important role in improving hand hygiene for food safety.
APA, Harvard, Vancouver, ISO, and other styles
4

Kumar, Yaman, Rohit Jain, Khwaja Mohd Salik, Rajiv Ratn Shah, Yifang Yin, and Roger Zimmermann. "Lipper: Synthesizing Thy Speech Using Multi-View Lipreading." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 2588–95. http://dx.doi.org/10.1609/aaai.v33i01.33012588.

Full text
Abstract:
Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular language and vocabulary mapping. Thus, in this paper we propose a multi-view lipreading to audio system, namely Lipper, which models it as a regression task. The model takes silent videos as input and produces speech as the output. With multi-view silent videos, we observe an improvement over single-view speech reconstruction results. We show this by presenting an exhaustive set of experiments for speaker-dependent, out-of-vocabulary and speaker-independent settings. Further, we compare the delay values of Lipper with other speechreading systems in order to show the real-time nature of audio produced. We also perform a user study for the audios produced in order to understand the level of comprehensibility of audios produced using Lipper.
APA, Harvard, Vancouver, ISO, and other styles
5

Ata, Sezin Kircali, Yuan Fang, Min Wu, Jiaqi Shi, Chee Keong Kwoh, and Xiaoli Li. "Multi-View Collaborative Network Embedding." ACM Transactions on Knowledge Discovery from Data 15, no. 3 (April 12, 2021): 1–18. http://dx.doi.org/10.1145/3441450.

Full text
Abstract:
Real-world networks often exist with multiple views, where each view describes one type of interaction among a common set of nodes. For example, on a video-sharing network, while two user nodes are linked, if they have common favorite videos in one view, then they can also be linked in another view if they share common subscribers. Unlike traditional single-view networks, multiple views maintain different semantics to complement each other. In this article, we propose M ulti-view coll A borative N etwork E mbedding (MANE), a multi-view network embedding approach to learn low-dimensional representations. Similar to existing studies, MANE hinges on diversity and collaboration—while diversity enables views to maintain their individual semantics, collaboration enables views to work together. However, we also discover a novel form of second-order collaboration that has not been explored previously, and further unify it into our framework to attain superior node representations. Furthermore, as each view often has varying importance w.r.t. different nodes, we propose MANE , an attention -based extension of MANE, to model node-wise view importance. Finally, we conduct comprehensive experiments on three public, real-world multi-view networks, and the results demonstrate that our models consistently outperform state-of-the-art approaches.
APA, Harvard, Vancouver, ISO, and other styles
6

Pan, Yingwei, Yue Chen, Qian Bao, Ning Zhang, Ting Yao, Jingen Liu, and Tao Mei. "Smart Director: An Event-Driven Directing System for Live Broadcasting." ACM Transactions on Multimedia Computing, Communications, and Applications 17, no. 4 (November 30, 2021): 1–18. http://dx.doi.org/10.1145/3448981.

Full text
Abstract:
Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keeps increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To relieve the directors from their intensive efforts, we develop an innovative automated sports broadcast directing system, called Smart Director, which aims at mimicking the typical human-in-the-loop broadcasting process to automatically create near-professional broadcasting programs in real-time by using a set of advanced multi-view video analysis algorithms. Inspired by the so-called “three-event” construction of sports broadcast [ 14 ], we build our system with an event-driven pipeline consisting of three consecutive novel components: (1) the Multi-View Event Localization to detect events by modeling multi-view correlations, (2) the Multi-View Highlight Detection to rank camera views by the visual importance for view selection, and (3) the Auto-Broadcasting Scheduler to control the production of broadcasting videos. To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events. It is also the first system to solve the novel problem of multi-view joint event detection by cross-view relation modeling. We conduct both objective and subjective evaluations on a real-world multi-camera soccer dataset, which demonstrate the quality of our auto-generated videos is comparable to that of the human-directed videos. Thanks to its faster response, our system is able to capture more fast-passing and short-duration events which are usually missed by human directors.
APA, Harvard, Vancouver, ISO, and other styles
7

Salik, Khwaja Mohd, Swati Aggarwal, Yaman Kumar, Rajiv Ratn Shah, Rohit Jain, and Roger Zimmermann. "Lipper: Speaker Independent Speech Synthesis Using Multi-View Lipreading." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 10023–24. http://dx.doi.org/10.1609/aaai.v33i01.330110023.

Full text
Abstract:
Lipreading is the process of understanding and interpreting speech by observing a speaker’s lip movements. In the past, most of the work in lipreading has been limited to classifying silent videos to a fixed number of text classes. However, this limits the applications of the lipreading since human language cannot be bound to a fixed set of words or languages. The aim of this work is to reconstruct intelligible acoustic speech signals from silent videos from various poses of a person which Lipper has never seen before. Lipper, therefore is a vocabulary and language agnostic, speaker independent and a near real-time model that deals with a variety of poses of a speaker. The model leverages silent video feeds from multiple cameras recording a subject to generate intelligent speech of a speaker. It uses a deep learning based STCNN+BiGRU architecture to achieve this goal. We evaluate speech reconstruction for speaker independent scenarios and demonstrate the speech output by overlaying the audios reconstructed by Lipper on the corresponding videos.
APA, Harvard, Vancouver, ISO, and other styles
8

Obayashi, Mizuki, Shohei Mori, Hideo Saito, Hiroki Kajita, and Yoshifumi Takatsume. "Multi-View Surgical Camera Calibration with None-Feature-Rich Video Frames: Toward 3D Surgery Playback." Applied Sciences 13, no. 4 (February 14, 2023): 2447. http://dx.doi.org/10.3390/app13042447.

Full text
Abstract:
Mounting multi-view cameras within a surgical light is a practical choice since some cameras are expected to observe surgery with few occlusions. Such multi-view videos must be reassembled for easy reference. A typical way is to reconstruct the surgery in 3D. However, the geometrical relationship among cameras is changed because each camera independently moves every time the lighting is reconfigured (i.e., every time surgeons touch the surgical light). Moreover, feature matching between surgical images is potentially challenging because of missing rich features. To address the challenge, we propose a feature-matching strategy that enables robust calibration of the multi-view camera system by collecting a set of a small number of matches over time while the cameras stay stationary. Our approach would enable conversion from multi-view videos to a 3D video. However, surgical videos are long and, thus, the cost of the conversion rapidly grows. Therefore, we implement a video player where only selected frames are converted to minimize time and data until playbacks. We demonstrate that sufficient calibration quality with real surgical videos can lead to a promising 3D mesh and a recently emerged 3D multi-layer representation. We reviewed comments from surgeons to discuss the differences between those 3D representations on an autostereoscopic display with respect to medical usage.
APA, Harvard, Vancouver, ISO, and other styles
9

Ming Du, Aswin C. Sankaranarayanan, and Rama Chellappa. "Robust Face Recognition From Multi-View Videos." IEEE Transactions on Image Processing 23, no. 3 (March 2014): 1105–17. http://dx.doi.org/10.1109/tip.2014.2300812.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mallik, Bruhanth, Akbar Sheikh-Akbari, Pooneh Bagheri Zadeh, and Salah Al-Majeed. "HEVC Based Frame Interleaved Coding Technique for Stereo and Multi-View Videos." Information 13, no. 12 (November 25, 2022): 554. http://dx.doi.org/10.3390/info13120554.

Full text
Abstract:
The standard HEVC codec and its extension for coding multiview videos, known as MV-HEVC, have proven to deliver improved visual quality compared to its predecessor, H.264/MPEG-4 AVC’s multiview extension, H.264-MVC, for the same frame resolution with up to 50% bitrate savings. MV-HEVC’s framework is similar to that of H.264-MVC, which uses a multi-layer coding approach. Hence, MV-HEVC would require all frames from other reference layers decoded prior to decoding a new layer. Thus, the multi-layer coding architecture would be a bottleneck when it comes to quicker frame streaming across different views. In this paper, an HEVC-based Frame Interleaved Stereo/Multiview Video Codec (HEVC-FISMVC) that uses a single layer encoding approach to encode stereo and multiview video sequences is presented. The frames of stereo or multiview video sequences are interleaved in such a way that encoding the resulting monoscopic video stream would maximize the exploitation of temporal, inter-view, and cross-view correlations and thus improving the overall coding efficiency. The coding performance of the proposed HEVC-FISMVC codec is assessed and compared with that of the standard MV-HEVC’s performance for three standard multi-view video sequences, namely: “Poznan_Street”, “Kendo” and “Newspaper1”. Experimental results show that the proposed codec provides more substantial coding gains than the anchor MV-HEVC for coding both stereo and multi-view video sequences.
APA, Harvard, Vancouver, ISO, and other styles
11

Lu, Guoyu, Yan Yan, Nicu Sebe, and Chandra Kambhamettu. "Indoor localization via multi-view images and videos." Computer Vision and Image Understanding 161 (August 2017): 145–60. http://dx.doi.org/10.1016/j.cviu.2017.05.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Das, Mithun, Rohit Raj, Punyajoy Saha, Binny Mathew, Manish Gupta, and Animesh Mukherjee. "HateMM: A Multi-Modal Dataset for Hate Video Classification." Proceedings of the International AAAI Conference on Web and Social Media 17 (June 2, 2023): 1014–23. http://dx.doi.org/10.1609/icwsm.v17i1.22209.

Full text
Abstract:
Hate speech has become one of the most significant issues in modern society, having implications in both the online and the offline world. Due to this, hate speech research has recently gained a lot of traction. However, most of the work has primarily focused on text media with relatively little work on images and even lesser on videos. Thus, early stage automated video moderation techniques are needed to handle the videos that are being uploaded to keep the platform safe and healthy. With a view to detect and remove hateful content from the video sharing platforms, our work focuses on hate video detection using multi-modalities. To this end, we curate ~43 hours of videos from BitChute and manually annotate them as hate or non-hate, along with the frame spans which could explain the labelling decision. To collect the relevant videos we harnessed search keywords from hate lexicons. We observe various cues in images and audio of hateful videos. Further, we build deep learning multi-modal models to classify the hate videos and observe that using all the modalities of the videos improves the overall hate speech detection performance (accuracy=0.798, macro F1-score=0.790) by ~5.7% compared to the best uni-modal model in terms of macro F1 score. In summary, our work takes the first step toward understanding and modeling hateful videos on video hosting platforms such as BitChute.
APA, Harvard, Vancouver, ISO, and other styles
13

Kurutepe, Engin, M. Reha Civanlar, and A. Murat Tekalp. "Interactive transport of multi-view videos for 3DTV applications." Journal of Zhejiang University-SCIENCE A 7, no. 5 (May 2006): 830–36. http://dx.doi.org/10.1631/jzus.2006.a0830.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Fu, Xubo, Wenbin Huang, Yaoran Sun, Xinhua Zhu, Julian Evans, Xian Song, Tongyu Geng, and Sailing He. "A Novel Dataset for Multi-View Multi-Player Tracking in Soccer Scenarios." Applied Sciences 13, no. 9 (April 25, 2023): 5361. http://dx.doi.org/10.3390/app13095361.

Full text
Abstract:
Localization and tracking in multi-player sports present significant challenges, particularly in wide and crowded scenes where severe occlusions can occur. Traditional solutions relying on a single camera are limited in their ability to accurately identify players and may result in ambiguous detection. To overcome these challenges, we proposed fusing information from multiple cameras positioned around the field to improve positioning accuracy and eliminate occlusion effects. Specifically, we focused on soccer, a popular and representative multi-player sport, and developed a multi-view recording system based on a 1+N strategy. This system enabled us to construct a new benchmark dataset and continuously collect data from several sports fields. The dataset includes 17 sets of densely annotated multi-view videos, each lasting 2 min, as well as 1100+ min multi-view videos. It encompasses a wide range of game types and nearly all scenarios that could arise during real game tracking. Finally, we conducted a thorough assessment of four multi-view multi-object tracking (MVMOT) methods and gained valuable insights into the tracking process in actual games.
APA, Harvard, Vancouver, ISO, and other styles
15

Yao, Li, Yingdong Han, and Xiaomin Li. "Fast and high-quality virtual view synthesis from multi-view plus depth videos." Multimedia Tools and Applications 78, no. 14 (February 9, 2019): 19325–40. http://dx.doi.org/10.1007/s11042-019-7236-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Chang, Shih Ming, Joseph C. Tsai, Shwu Huey Yen, and Timothy K. Shih. "Constructing interactive multi-view videos based on image-based rendering." International Journal of Computational Science and Engineering 10, no. 4 (2015): 402. http://dx.doi.org/10.1504/ijcse.2015.070996.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Khan, Shahid, Nazeer Muhammad, Shabieh Farwa, Tanzila Saba, and Zahid Mahmood. "Early CU Depth Decision and Reference Picture Selection for Low Complexity MV-HEVC." Symmetry 11, no. 4 (April 1, 2019): 454. http://dx.doi.org/10.3390/sym11040454.

Full text
Abstract:
The Multi-View extension of High Efficiency Video Coding (MV-HEVC) has improved the coding efficiency of multi-view videos, but this comes at the cost of the extra coding complexity of the MV-HEVC encoder. This coding complexity can be reduced by efficiently reducing time-consuming encoding operations. In this work, we propose two methods to reduce the encoder complexity. The first one is Early Coding unit Splitting (ECS), and the second is the Efficient Reference Picture Selection (ERPS) method. In the ECS method, the decision of Coding Unit (CU) splitting for dependent views is made on the CU splitting information obtained from the base view, while the ERPS method for dependent views is based on selecting reference pictures on the basis of the temporal location of the picture being encoded. Simulation results reveal that our proposed methods approximately reduce the encoding time by 58% when compared with HTM (16.2), the reference encoder for MV-HEVC.
APA, Harvard, Vancouver, ISO, and other styles
18

Paramanantham, Vinsent, and Dr SureshKumar S. "Multi View Video Summarization Using RNN and SURF Based High Level Moving Object Feature Frames." International Journal of Engineering Research in Computer Science and Engineering 9, no. 5 (May 14, 2022): 1–14. http://dx.doi.org/10.36647/ijercse/09.05.art001.

Full text
Abstract:
Multi-View Video summarization is a process to ease the storage consumption that facilitates organized storage, and perform other mainline videos analytical task. This in-turn helps quick search or browse and retrieve the video data with minimum time and without losing crucial data. In static video summarization, there is less challenge in time and sequence issues to rearrange the video-synopsis. The low-level features are easy to compute and retrieve. But for high-level features like event detection, emotion detection, object recognition, face detection, gesture detection, and others requires the comprehension of the video content. This research is to propose an approach to over- come the difficulties in handling the high-level features. The distinguishable contents from the videos are identified by object detection and feature-based area strategy. The major aspect of the proposed solution is to retrieve the attributes of a motion source from a video frame. By dividing the details of the object that are available in the video frame wavelet decomposition are achieved. The motion frequency scoring method records the time of motions in the video. The frequency motion feature of video usage is a challenge given the continuous change of objects shape. Therefore, the object position and corner points are spotted using Speeded Up Robust Features (SURF) feature points. Support vector machine clustering extracts keyframes. The memory-based re- current neural network (RNN) recognizes the object in the video frame and remembers a long sequence. RNN is an artificial neural network where nodes form a temporal relationship. The attention layer in the proposed RNN network extracts the details about the objects in motion. The motion objects identified using the three video clippings is finally summarized using video summarization algorithm. To perform the simulation, MATLAB R 2014b software was used.
APA, Harvard, Vancouver, ISO, and other styles
19

Liang, Qiaokang, Wanneng Wu, Yukun Yang, Ruiheng Zhang, Yu Peng, and Min Xu. "Multi-Player Tracking for Multi-View Sports Videos with Improved K-Shortest Path Algorithm." Applied Sciences 10, no. 3 (January 27, 2020): 864. http://dx.doi.org/10.3390/app10030864.

Full text
Abstract:
Sports analysis has recently attracted increasing research efforts in computer vision. Among them, basketball video analysis is very challenging due to severe occlusions and fast motions. As a typical tracking-by-detection method, k-shortest paths (KSP) tracking framework has been well used for multiple-person tracking. While effective and fast, the neglect of the appearance model would easily lead to identity switches, especially when two or more players are intertwined with each other. This paper addresses this problem by taking the appearance features into account based on the KSP framework. Furthermore, we also introduce a similarity measurement method that can fuse multiple appearance features together. In this paper, we select jersey color and jersey number as two example features. Experiments indicate that about 70% of jersey color and 50% of jersey number over a whole sequence would ensure our proposed method preserve the player identity better than the existing KSP tracking method.
APA, Harvard, Vancouver, ISO, and other styles
20

Wang, Yuping. "Capture Surface Light Field for Gesture with Sparse Multi-view Videos." Journal of Information and Computational Science 11, no. 10 (July 1, 2014): 3271–80. http://dx.doi.org/10.12733/jics20104039.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Deshmukh, Amar B., and N. Usha Rani. "Optimization-Driven Kernel and Deep Convolutional Neural Network for Multi-View Face Video Super Resolution." International Journal of Digital Crime and Forensics 12, no. 3 (July 2020): 77–95. http://dx.doi.org/10.4018/ijdcf.2020070106.

Full text
Abstract:
One of the major challenges faced by video surveillance is recognition from low-resolution videos or person identification. Image enhancement methods play a significant role in enhancing the resolution of the video. This article introduces a technique for face super resolution based on a deep convolutional neural network (Deep CNN). At first, the video frames are extracted from the input video and the face detection is performed using the Viola-Jones algorithm. The detected face image and the scaling factors are fed into the Fractional-Grey Wolf Optimizer (FGWO)-based kernel weighted regression model and the proposed Deep CNN separately. Finally, the results obtained from both the techniques are integrated using a fuzzy logic system, offering a face image with enhanced resolution. Experimentation is carried out using the UCSD face video dataset, and the effectiveness of the proposed Deep CNN is checked depending on the block size and the upscaling factor values and is evaluated to be the best when compared to other existing techniques with an improved SDME value of 80.888.
APA, Harvard, Vancouver, ISO, and other styles
22

WANG, Xueting, Kensho HARA, Yu ENOKIBORI, Takatsugu HIRAYAMA, and Kenji MASE. "Personal Viewpoint Navigation Based on Object Trajectory Distribution for Multi-View Videos." IEICE Transactions on Information and Systems E101.D, no. 1 (2018): 193–204. http://dx.doi.org/10.1587/transinf.2017edp7122.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Gärtner, Erik, Aleksis Pirinen, and Cristian Sminchisescu. "Deep Reinforcement Learning for Active Human Pose Estimation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 10835–44. http://dx.doi.org/10.1609/aaai.v34i07.6714.

Full text
Abstract:
Most 3d human pose estimation methods assume that input – be it images of a scene collected from one or several viewpoints, or from a video – is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially – in ‘time-freeze’ mode – and/or temporally, by selecting informative viewpoints that improve its estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture which learns to select appropriate views, in space and time, to feed an underlying monocular pose estimator. We evaluate our model using single- and multi-target estimators with strong result in both settings. Our system further learns automatic stopping conditions in time and transition functions to the next temporal processing step in videos. In extensive experiments with the Panoptic multi-view setup, and for complex scenes containing multiple people, we show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines.
APA, Harvard, Vancouver, ISO, and other styles
24

Mei, Ling, Yizhuo He, Farnoosh Javadi Fishani, Yaowen Yu, Lijun Zhang, and Helge Rhodin. "Learning Domain-Adaptive Landmark Detection-Based Self-Supervised Video Synchronization for Remote Sensing Panorama." Remote Sensing 15, no. 4 (February 9, 2023): 953. http://dx.doi.org/10.3390/rs15040953.

Full text
Abstract:
The synchronization of videos is an essential pre-processing step for multi-view reconstruction such as the image mosaic by UAV remote sensing; it is often solved with hardware solutions in motion capture studios. However, traditional synchronization setups rely on manual interventions or software solutions and only fit for a particular domain of motions. In this paper, we propose a self-supervised video synchronization algorithm that attains high accuracy in diverse scenarios without cumbersome manual intervention. At the core is a motion-based video synchronization algorithm that infers temporal offsets from the trajectories of moving objects in the videos. It is complemented by a self-supervised scene decomposition algorithm that detects common parts and their motion tracks in two or more videos, without requiring any manual positional supervision. We evaluate our approach on three different datasets, including the motion of humans, animals, and simulated objects, and use it to build the view panorama of the remote sensing field. All experiments demonstrate that the proposed location-based synchronization is more effective compared to the state-of-the-art methods, and our self-supervised inference approaches the accuracy of supervised solutions, while being much easier to adapt to a new target domain.
APA, Harvard, Vancouver, ISO, and other styles
25

Lavrushkin, Sergey, Ivan Molodetskikh, Konstantin Kozhemyakov, and Dmitriy Vatolin. "Stereoscopic quality assessment of 1,000 VR180 videos using 8 metrics." Electronic Imaging 2021, no. 2 (January 18, 2021): 350–1. http://dx.doi.org/10.2352/issn.2470-1173.2021.2.sda-350.

Full text
Abstract:
In this work we present a large-scale analysis of stereoscopic quality for 1,000 VR180 YouTube videos. VR180 is a new S3D format for VR devices which stores the view for only a single hemisphere. Instead of a multi-camera rig, this format requires just two cameras with fisheye lenses similar to conventional 3D-shooting, resulting in cost reduction of the final device and simplification of the shooting process. But as in the conventional stereoscopic format, VR180 videos suffer from stereoscopyrelated problems specific to 3D shooting. In this paper we analyze videos to detect the most common stereoscopic artifacts using objective quality metrics, including color, sharpness and geometry mismatch between views and more. Our study depicts the current state of S3D technical quality of VR180 videos and reveals its overall poor condition, as most of the analyzed videos exhibit at least one of the stereoscopic artifacts, which shows a necessity for stereoscopic quality control in modern VR180 shooting.
APA, Harvard, Vancouver, ISO, and other styles
26

Lee, Kyu-Yul, and Jae-Young Sim. "Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping." IEEE Access 6 (2018): 26904–17. http://dx.doi.org/10.1109/access.2018.2835659.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Zhang, Rong, Wei Li, Peng Wang, Chenye Guan, Jin Fang, Yuhang Song, Jinhui Yu, Baoquan Chen, Weiwei Xu, and Ruigang Yang. "AutoRemover: Automatic Object Removal for Autonomous Driving Videos." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12853–61. http://dx.doi.org/10.1609/aaai.v34i07.6982.

Full text
Abstract:
Motivated by the need for photo-realistic simulation in autonomous driving, in this paper we present a video inpainting algorithm AutoRemover, designed specifically for generating street-view videos without any moving objects. In our setup we have two challenges: the first is the shadow, shadows are usually unlabeled but tightly coupled with the moving objects. The second is the large ego-motion in the videos. To deal with shadows, we build up an autonomous driving shadow dataset and design a deep neural network to detect shadows automatically. To deal with large ego-motion, we take advantage of the multi-source data, in particular the 3D data, in autonomous driving. More specifically, the geometric relationship between frames is incorporated into an inpainting deep neural network to produce high-quality structurally consistent video output. Experiments show that our method outperforms other state-of-the-art (SOTA) object removal algorithms, reducing the RMSE by over 19%.
APA, Harvard, Vancouver, ISO, and other styles
28

Ziwei Song, Ziwei Song, Xiaowen Cai Ziwei Song, Xiaoying Zhang Xiaowen Cai, Jiaxiang Zong Xiaoying Zhang, and Gangyong Jia Jiaxiang Zong. "Time-based Calibration: A Way to Ensure that Stitched Images are Captured Simultaneously." 網際網路技術學刊 23, no. 6 (November 2022): 1441–48. http://dx.doi.org/10.53106/160792642022112306025.

Full text
Abstract:
<p>With the rapid development of modern science and technology, people&rsquo;s demand for such information as images and videos is also growing, and the requirements for image and video quality, clarity and view range are also increasing. Therefore, the construction of high-resolution, wide-view panoramic video has also gradually become a hot research topic. Video stitching technology is an increasingly popular research direction in the field of graphics, and the approach addresses the problem of a limited range of views due to a single device capture. Many researchers have proposed various algorithms for video stitching and achieved good stitching results, but the research on time synchronization calibration of different video sources is not yet well developed. This thesis proposes a multi-source video frames calibration technique based on external information sources for solving the problems of ghosting and cutting when stitching with different video sources. The proposed method calibrates the video stitching by introducing an information source and calculating the time difference of different devices. The error of the calibrated video stitching is less than 33 ms, which can guarantee the quality of the spliced video.</p> <p>&nbsp;</p>
APA, Harvard, Vancouver, ISO, and other styles
29

Chen, Xinqiang, Shengzheng Wang, Chaojian Shi, Huafeng Wu, Jiansen Zhao, and Junjie Fu. "Robust Ship Tracking via Multi-view Learning and Sparse Representation." Journal of Navigation 72, no. 1 (September 13, 2018): 176–92. http://dx.doi.org/10.1017/s0373463318000504.

Full text
Abstract:
Conventional visual ship tracking methods employ single and shallow features for the ship tracking task, which may fail when a ship presents a different appearance and shape in maritime surveillance videos. To overcome this difficulty, we propose to employ a multi-view learning algorithm to extract a highly coupled and robust ship descriptor from multiple distinct ship feature sets. First, we explore multiple distinct ship feature sets consisting of a Laplacian-of-Gaussian (LoG) descriptor, a Local Binary Patterns (LBP) descriptor, a Gabor filter, a Histogram of Oriented Gradients (HOG) descriptor and a Canny descriptor, which present geometry structure, texture and contour information, and more. Then, we propose a framework for integrating a multi-view learning algorithm and a sparse representation method to track ships efficiently and effectively. Finally, our framework is evaluated in four typical maritime surveillance scenarios. The experimental results show that the proposed framework outperforms the conventional and typical ship tracking methods.
APA, Harvard, Vancouver, ISO, and other styles
30

Ben-Dov, Omri, and Tsevi Beatus. "Model-Based Tracking of Fruit Flies in Free Flight." Insects 13, no. 11 (November 3, 2022): 1018. http://dx.doi.org/10.3390/insects13111018.

Full text
Abstract:
Insect flight is a complex interdisciplinary phenomenon. Understanding its multiple aspects, such as flight control, sensory integration, physiology and genetics, often requires the analysis of large amounts of free flight kinematic data. Yet, one of the main bottlenecks in this field is automatically and accurately extracting such data from multi-view videos. Here, we present a model-based method for the pose estimation of free-flying fruit flies from multi-view high-speed videos. To obtain a faithful representation of the fly with minimum free parameters, our method uses a 3D model that includes two new aspects of wing deformation: a non-fixed wing hinge and a twisting wing surface. The method is demonstrated for free and perturbed flight. Our method does not use prior assumptions on the kinematics apart from the continuity of the wing pitch angle. Hence, this method can be readily adjusted for other insect species.
APA, Harvard, Vancouver, ISO, and other styles
31

Ho, Ting-Yu, De-Nian Yang, and Wanjiun Liao. "Efficient Resource Allocation of Mobile Multi-View 3D Videos with Depth-Image-Based Rendering." IEEE Transactions on Mobile Computing 14, no. 2 (February 1, 2015): 344–57. http://dx.doi.org/10.1109/tmc.2014.2321401.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

S. Kumar, R. Mathusoothana. "Robust multi-view videos face recognition based on particle filter with immune genetic algorithm." IET Image Processing 13, no. 4 (March 28, 2019): 600–606. http://dx.doi.org/10.1049/iet-ipr.2018.5268.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Luo, Jiajia, Wei Wang, and Hairong Qi. "Feature Extraction and Representation for Distributed Multi-View Human Action Recognition." Emerging and Selected Topics in Circuits and Systems, IEEE Journal on 3, no. 2 (June 2013): 145–54. http://dx.doi.org/10.1109/jetcas.2013.2256824.

Full text
Abstract:
Multi-view human action recognition has gained a lot of attention in recent years for its superior performance as compared to single view recognition. In this paper, we propose a new framework for the real-time realization of human action recognition in distributed camera networks (DCNs). We first present a new feature descriptor (Mltp-hist) that is tolerant to illumination change, robust in homogeneous region and computationally efficient. Taking advantage of the proposed Mltp-hist, the noninformative 3-D patches generated from the background can be further removed automatically that effectively highlights the foreground patches. Next, a new feature representation method based on sparse coding is presented to generate the histogram representation of local videos to be transmitted to the base station for classification. Due to the sparse representation of extracted features, the approximation error is reduced. Finally, at the base station, a probability model is produced to fuse the information from various views and a class label is assigned accordingly. Compared to the existing algorithms, the proposed framework has three advantages while having less requirements on memory and bandwidth consumption: 1) no preprocessing is required; 2) communication among cameras is unnecessary; and 3) positions and orientations of cameras do not need to be fixed. We further evaluate the proposed framework on the most popular multi-view action dataset IXMAS. Experimental results indicate that our proposed framework repeatedly achieves state-of-the-art results when various numbers of views are tested. In addition, our approach is tolerant to the various combination of views and benefit from introducing more views at the testing stage. Especially, our results are still satisfactory even when large misalignment exists between the training and testing samples.
APA, Harvard, Vancouver, ISO, and other styles
34

Kottler, Amanda, and Carol Long. "Discourses, Videos and Talk about Sexual Violence: A Multi-Disciplinary Enterprise." South African Journal of Psychology 27, no. 2 (June 1997): 64–74. http://dx.doi.org/10.1177/008124639702700202.

Full text
Abstract:
Over the past few years a number of initiatives against sexual harassment and violence have been launched by large corporations and South African universities, with mixed results. As an active member of one of the first projects of this kind, it became evident to Kottler how severely hampered policy making, education and prevention are by definitional problems and varied gendered and cultural constructions of the issues involved. With a view to addressing some of these an educational campaign at the University of Cape Town was proposed in 1993, part of which involved an attempt at an innovative multi-methodological approach free from the trappings of one particular discipline. Drawing on research using post-modern ideas and social constructionism looking at talk about sexual harassment, a post-graduate drama producer (Peter Hayes) was drawn in. He conducted workshop discussions on sexual harassment with men and women from a wide range of contexts (all audio or video taped), with a view to producing a dramatic piece of forum theatre. The outcome, entitled ONE MAN'S MEAT… IS A WOMAN'S POISON was performed by two women and two men, two white and two coloured at a number of university venues, on occasion to extremely large audiences (1000 at the University of the Western Cape). Requests for additional performances came from a number of unexpected places, for example, Rape Crisis, Cape Town used it as part of their counselling training course in 1994. A 28-minute video of a performance, incorporating audience participation in rescripting and replaying the scenarios was produced by an educational film-maker (Lindy Wilson). Following conference presentations of the video, a number of copies have been sold to other universities and NGOs both locally and overseas. In this article we describe the process leading up to the production of the play and offer an analysis and discussion of the play as it was finally constructed, linking this to some of the text produced at the rehearsals.
APA, Harvard, Vancouver, ISO, and other styles
35

Chen, Xinqiang, Huixing Chen, Huafeng Wu, Yanguo Huang, Yongsheng Yang, Wenhui Zhang, and Pengwen Xiong. "Robust Visual Ship Tracking with an Ensemble Framework via Multi-View Learning and Wavelet Filter." Sensors 20, no. 3 (February 10, 2020): 932. http://dx.doi.org/10.3390/s20030932.

Full text
Abstract:
Maritime surveillance videos provide crucial on-spot kinematic traffic information (traffic volume, ship speeds, headings, etc.) for varied traffic participants (maritime regulation departments, ship crew, ship owners, etc.) which greatly benefits automated maritime situational awareness and maritime safety improvement. Conventional models heavily rely on visual ship features for the purpose of tracking ships from maritime image sequences which may contain arbitrary tracking oscillations. To address this issue, we propose an ensemble ship tracking framework with a multi-view learning algorithm and wavelet filter model. First, the proposed model samples ship candidates with a particle filter following the sequential importance sampling rule. Second, we propose a multi-view learning algorithm to obtain raw ship tracking results in two steps: extracting a group of distinct ship contour relevant features (i.e., Laplacian of Gaussian, local binary pattern, Gabor filter, histogram of oriented gradient, and canny descriptors) and learning high-level intrinsic ship features by jointly exploiting underlying relationships shared by each type of ship contour features. Third, with the help of the wavelet filter, we performed a data quality control procedure to identify abnormal oscillations in the ship positions which were further corrected to generate the final ship tracking results. We demonstrate the proposed ship tracker’s performance on typical maritime traffic scenarios through four maritime surveillance videos.
APA, Harvard, Vancouver, ISO, and other styles
36

Wang, Wei, Yujia Xie, and Luliang Tang. "Hierarchical Clustering Algorithm for Multi-Camera Vehicle Trajectories Based on Spatio-Temporal Grouping under Intelligent Transportation and Smart City." Sensors 23, no. 15 (August 3, 2023): 6909. http://dx.doi.org/10.3390/s23156909.

Full text
Abstract:
With the emergence of intelligent transportation and smart city system, the issue of how to perform an efficient and reasonable clustering analysis of the mass vehicle trajectories on multi-camera monitoring videos through computer vision has become a significant area of research. The traditional trajectory clustering algorithm does not consider camera position and field of view and neglects the hierarchical relation of the video object motion between the camera and the scenario, leading to poor multi-camera video object trajectory clustering. To address this challenge, this paper proposed a hierarchical clustering algorithm for multi-camera vehicle trajectories based on spatio-temporal grouping. First, we supervised clustered vehicle trajectories in the camera group according to the optimal point correspondence rule for unequal-length trajectories. Then, we extracted the starting and ending points of the video object under each group, hierarchized the trajectory according to the number of cross-camera groups, and supervised clustered the subsegment sets of different hierarchies. This method takes into account the spatial relationship between the camera and video scenario, which is not considered by traditional algorithms. The effectiveness of this approach has been proved through experiments comparing silhouette coefficient and CPU time.
APA, Harvard, Vancouver, ISO, and other styles
37

Holte, Michael B., Cuong Tran, Mohan M. Trivedi, and Thomas B. Moeslund. "Human Pose Estimation and Activity Recognition From Multi-View Videos: Comparative Explorations of Recent Developments." IEEE Journal of Selected Topics in Signal Processing 6, no. 5 (September 2012): 538–52. http://dx.doi.org/10.1109/jstsp.2012.2196975.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Bernad-Mechó, Edgar, and Carolina Girón-García. "multimodal analysis of humour as an engagement strategy in YouTube research dissemination videos." European Journal of Humour Research 11, no. 1 (March 28, 2023): 46–66. http://dx.doi.org/10.7592/ejhr.2023.11.1.760.

Full text
Abstract:
Science popularisation has received widespread interest in the last decade. With the rapid evolution from print to digital modes of information, science outreach has been seen to cross educational boundaries and become integrated into wider contexts such as YouTube. One of the main features of the success of research dissemination videos on YouTube is the ability to establish a meaningful connection with the audience. In this regard, humour may be used as a strategy for engagement. Most studies on humour, however, are conducted solely from a purely linguistic perspective, obviating the complex multimodal reality of communication in the digital era. Considering this background, we set out to explore how humour is used from a multimodal point of view as an engagement strategy in YouTube research dissemination. We selected three research dissemination videos from three distinct YouTube channels to fulfil this aim. After an initial viewing, 22 short humoristic fragments that were particularly engaging were selected. These fragments were further explored using Multimodal Analysis - Video (MAV)[1], a multi-layered annotation tool that allows for fine-grained multimodal analysis. Humoristic strategies and contextual features were explored, as well as two main types of modes: embodied and filmic. Results show the presence of 9 linguistic strategies to introduce humour in YouTube science dissemination videos which are always accompanied by heterogeneous combinations of embodied and filmic modes that contribute to fully achieving humoristic purposes. [1] Multi-layer annotation software used to describe the use of semiotic modes in video files. By using this software, researchers may analyse, for instance, how gestures, gaze, proxemics, head movements, facial expression, etc. are employed in a given file.
APA, Harvard, Vancouver, ISO, and other styles
39

Abarna, K. T. Meena, and T. Suresh. "Enrich multi-channel P2P VoD streaming based on dynamic replication strategy." International Journal of Advances in Applied Sciences 9, no. 2 (June 1, 2020): 110. http://dx.doi.org/10.11591/ijaas.v9.i2.pp110-116.

Full text
Abstract:
Peer-to-Peer Video-on-Demand (VoD) is a favorable solution which compromises thousands of videos to millions of users with completeinteractive video watching stream. Most of the profitable P2P streaming groupsPPLive, PPStream and UUSee have announced a multi-channel P2P VoD system that approvals user to view extra one channel at a time. The present multiple channel P2P VoD system resonant a video at a low streaming rate due to the channel resource inequity and channel churn. In order to growth the streaming capacity, this paper highlights completely different effective helpers created resource balancing scheme that actively recognizes the supply-and-demand inequity in multiple channels. Moreover, peers in an extra channel help its unused bandwidth resources to peers in a shortage channel that minimizes the server bandwidth consumption. To provide a desired replication ratio for optimal caching, it develops a dynamic replication strategy that optimally tunes the number of replicas based on dynamic popularity in a distributed and dynamic routine. This work accurately forecasts the varying popularity over time using Auto-Regressive Integrated Moving Average (ARIMA) model, an effective time-series forecasting technique that supports dynamic environment. Experimental assessment displays that the offered dynamic replication strategy which should achieves high streaming capacity under reduced server workload when associated to existing replication algorithms.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhang, Yingying, Junyu Gao, Xiaoshan Yang, Chang Liu, Yan Li, and Changsheng Xu. "Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12902–9. http://dx.doi.org/10.1609/aaai.v34i07.6988.

Full text
Abstract:
With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user's major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH-GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.
APA, Harvard, Vancouver, ISO, and other styles
41

Torresani, A., and F. Remondino. "VIDEOGRAMMETRY VS PHOTOGRAMMETRY FOR HERITAGE 3D RECONSTRUCTION." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W15 (August 26, 2019): 1157–62. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w15-1157-2019.

Full text
Abstract:
<p><strong>Abstract.</strong> In the last years we are witnessing an increasing quality (and quantity) of video streams and a growing capability of SLAM-based methods to derive 3D data from video. Video sequences can be easily acquired by non-expert surveyors and possibly used for 3D documentation purposes. The aim of the paper is to evaluate the possibility to perform 3D reconstructions of heritage scenarios using videos ("videogrammetry"), e.g. acquired with smartphones. Video frames are extracted from the sequence using a fixed-time interval and two advanced methods. Frames are then processed applying automated image orientation / Structure from Motion (SfM) and dense image matching / Multi-View Stereo (MVS) methods. Obtained 3D dense point clouds are the visually validated as well as compared with photogrammetric ground truth archived acquiring image with a reflex camera or analysing 3D data's noise on flat surfaces.</p>
APA, Harvard, Vancouver, ISO, and other styles
42

Xu, Jianfeng, and Kazuyuki Tasaka. "[Papers] Keep Your Eye on the Ball: Detection of Kicking Motions in Multi-view 4K Soccer Videos." ITE Transactions on Media Technology and Applications 8, no. 2 (2020): 81–88. http://dx.doi.org/10.3169/mta.8.81.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Liu, Yuchi, Zhongdao Wang, Xiangxin Zhou, and Liang Zheng. "A Study of Using Synthetic Data for Effective Association Knowledge Learning." Machine Intelligence Research 20, no. 2 (March 8, 2023): 194–206. http://dx.doi.org/10.1007/s11633-022-1380-x.

Full text
Abstract:
AbstractAssociation, aiming to link bounding boxes of the same identity in a video sequence, is a central component in multi-object tracking (MOT). To train association modules, e.g., parametric networks, real video data are usually used. However, annotating person tracks in consecutive video frames is expensive, and such real data, due to its inflexibility, offer us limited opportunities to evaluate the system performance w.r.t. changing tracking scenarios. In this paper, we study whether 3D synthetic data can replace real-world videos for association training. Specifically, we introduce a large-scale synthetic data engine named MOTX, where the motion characteristics of cameras and objects are manually configured to be similar to those of real-world datasets. We show that, compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques. Our intriguing observation is credited to two factors. First and foremost, 3D engines can well simulate motion factors such as camera movement, camera view, and object movement so that the simulated videos can provide association modules with effective motion features. Second, the experimental results show that the appearance domain gap hardly harms the learning of association knowledge. In addition, the strong customization ability of MOTX allows us to quantitatively assess the impact of motion factors on MOT, which brings new insights to the community.
APA, Harvard, Vancouver, ISO, and other styles
44

Mambou, Sebastien, Ondrej Krejcar, Kamil Kuca, and Ali Selamat. "Novel Cross-View Human Action Model Recognition Based on the Powerful View-Invariant Features Technique." Future Internet 10, no. 9 (September 13, 2018): 89. http://dx.doi.org/10.3390/fi10090089.

Full text
Abstract:
One of the most important research topics nowadays is human action recognition, which is of significant interest to the computer vision and machine learning communities. Some of the factors that hamper it include changes in postures and shapes and the memory space and time required to gather, store, label, and process the pictures. During our research, we noted a considerable complexity to recognize human actions from different viewpoints, and this can be explained by the position and orientation of the viewer related to the position of the subject. We attempted to address this issue in this paper by learning different special view-invariant facets that are robust to view variations. Moreover, we focused on providing a solution to this challenge by exploring view-specific as well as view-shared facets utilizing a novel deep model called the sample-affinity matrix (SAM). These models can accurately determine the similarities among samples of videos in diverse angles of the camera and enable us to precisely fine-tune transfer between various views and learn more detailed shared facets found in cross-view action identification. Additionally, we proposed a novel view-invariant facets algorithm that enabled us to better comprehend the internal processes of our project. Using a series of experiments applied on INRIA Xmas Motion Acquisition Sequences (IXMAS) and the Northwestern–UCLA Multi-view Action 3D (NUMA) datasets, we were able to show that our technique performs much better than state-of-the-art techniques.
APA, Harvard, Vancouver, ISO, and other styles
45

El Kaid, Amal, Denis Brazey, Vincent Barra, and Karim Baïna. "Top-Down System for Multi-Person 3D Absolute Pose Estimation from Monocular Videos." Sensors 22, no. 11 (May 28, 2022): 4109. http://dx.doi.org/10.3390/s22114109.

Full text
Abstract:
Two-dimensional (2D) multi-person pose estimation and three-dimensional (3D) root-relative pose estimation from a monocular RGB camera have made significant progress recently. Yet, real-world applications require depth estimations and the ability to determine the distances between people in a scene. Therefore, it is necessary to recover the 3D absolute poses of several people. However, this is still a challenge when using cameras from single points of view. Furthermore, the previously proposed systems typically required a significant amount of resources and memory. To overcome these restrictions, we herein propose a real-time framework for multi-person 3D absolute pose estimation from a monocular camera, which integrates a human detector, a 2D pose estimator, a 3D root-relative pose reconstructor, and a root depth estimator in a top-down manner. The proposed system, called Root-GAST-Net, is based on modified versions of GAST-Net and RootNet networks. The efficiency of the proposed Root-GAST-Net system is demonstrated through quantitative and qualitative evaluations on two benchmark datasets, Human3.6M and MuPoTS-3D. On all evaluated metrics, our experimental results on the MuPoTS-3D dataset outperform the current state-of-the-art by a significant margin, and can run in real-time at 15 fps on the Nvidia GeForce GTX 1080.
APA, Harvard, Vancouver, ISO, and other styles
46

Li, Jing, Shuo Chen, Fangbing Zhang, Erkang Li, Tao Yang, and Zhaoyang Lu. "An Adaptive Framework for Multi-Vehicle Ground Speed Estimation in Airborne Videos." Remote Sensing 11, no. 10 (May 24, 2019): 1241. http://dx.doi.org/10.3390/rs11101241.

Full text
Abstract:
With the rapid development of unmanned aerial vehicles (UAVs), UAV-based intelligent airborne surveillance systems represented by real-time ground vehicle speed estimation have attracted wide attention from researchers. However, there are still many challenges in extracting speed information from UAV videos, including the dynamic moving background, small target size, complicated environment, and diverse scenes. In this paper, we propose a novel adaptive framework for multi-vehicle ground speed estimation in airborne videos. Firstly, we build a traffic dataset based on UAV. Then, we use the deep learning detection algorithm to detect the vehicle in the UAV field of view and obtain the trajectory in the image through the tracking-by-detection algorithm. Thereafter, we present a motion compensation method based on homography. This method obtains matching feature points by an optical flow method and eliminates the influence of the detected target to accurately calculate the homography matrix to determine the real motion trajectory in the current frame. Finally, vehicle speed is estimated based on the mapping relationship between the pixel distance and the actual distance. The method regards the actual size of the car as prior information and adaptively recovers the pixel scale by estimating the vehicle size in the image; it then calculates the vehicle speed. In order to evaluate the performance of the proposed system, we carry out a large number of experiments on the AirSim Simulation platform as well as real UAV aerial surveillance experiments. Through quantitative and qualitative analysis of the simulation results and real experiments, we verify that the proposed system has a unique ability to detect, track, and estimate the speed of ground vehicles simultaneously even with a single downward-looking camera. Additionally, the system can obtain effective and accurate speed estimation results, even in various complex scenes.
APA, Harvard, Vancouver, ISO, and other styles
47

Shozu, Kanto, Masaaki Komatsu, Akira Sakai, Reina Komatsu, Ai Dozen, Hidenori Machino, Suguru Yasutomi, et al. "Model-Agnostic Method for Thoracic Wall Segmentation in Fetal Ultrasound Videos." Biomolecules 10, no. 12 (December 17, 2020): 1691. http://dx.doi.org/10.3390/biom10121691.

Full text
Abstract:
The application of segmentation methods to medical imaging has the potential to create novel diagnostic support models. With respect to fetal ultrasound, the thoracic wall is a key structure on the assessment of the chest region for examiners to recognize the relative orientation and size of structures inside the thorax, which are critical components in neonatal prognosis. In this study, to improve the segmentation performance of the thoracic wall in fetal ultrasound videos, we proposed a novel model-agnostic method using deep learning techniques: the Multi-Frame + Cylinder method (MFCY). The Multi-frame method (MF) uses time-series information of ultrasound videos, and the Cylinder method (CY) utilizes the shape of the thoracic wall. To evaluate the achieved improvement, we performed segmentation using five-fold cross-validation on 538 ultrasound frames in the four-chamber view (4CV) of 256 normal cases using U-net and DeepLabv3+. MFCY increased the mean values of the intersection over union (IoU) of thoracic wall segmentation from 0.448 to 0.493 for U-net and from 0.417 to 0.470 for DeepLabv3+. These results demonstrated that MFCY improved the segmentation performance of the thoracic wall in fetal ultrasound videos without altering the network structure. MFCY is expected to facilitate the development of diagnostic support models in fetal ultrasound by providing further accurate segmentation of the thoracic wall.
APA, Harvard, Vancouver, ISO, and other styles
48

Wang, Shuhang, Jianfeng Li, Pengshuai Yang, Tianxiao Gao, Alex R. Bowers, and Gang Luo. "Towards Wide Range Tracking of Head Scanning Movement in Driving." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 13 (April 20, 2020): 2050033. http://dx.doi.org/10.1142/s0218001420500330.

Full text
Abstract:
Gaining environmental awareness through lateral head scanning (yaw rotations) is important for driving safety, especially when approaching intersections. Therefore, head scanning movements could be an important behavioral metric for driving safety research and driving risk mitigation systems. Tracking head scanning movements with a single in-car camera is preferred hardware-wise, but it is very challenging to track the head over almost a [Formula: see text] range. In this paper, we investigate two state-of-the-art methods, a multi-loss deep residual learning method with 50 layers (multi-loss ResNet-50) and an ORB feature-based simultaneous localization and mapping method (ORB-SLAM). While deep learning methods have been extensively studied for head pose detection, this is the first study in which SLAM has been employed to innovatively track head scanning over a very wide range. Our laboratory experimental results showed that ORB-SLAM was more accurate than multi-loss ResNet-50, which often failed when many facial features were not in the view. On the contrary, ORB-SLAM was able to continue tracking as it does not rely on particular facial features. Testing with real driving videos demonstrated the feasibility of using ORB-SLAM for tracking large lateral head scans in naturalistic video data.
APA, Harvard, Vancouver, ISO, and other styles
49

Huang, Wenbin, Sailing He, Yaoran Sun, Julian Evans, Xian Song, Tongyu Geng, Guanrong Sun, and Xubo Fu. "Open Dataset Recorded by Single Cameras for Multi-Player Tracking in Soccer Scenarios." Applied Sciences 12, no. 15 (July 25, 2022): 7473. http://dx.doi.org/10.3390/app12157473.

Full text
Abstract:
Multi-player action recognition for automatic analysis in sports is the subject of increasing attention. Trajectory-tracking technology is key for accurate recognition, but little research has focused on this aspect, especially for non-professional matches. Here, we study multi-player tracking in the most popular and complex sport among non-professionals—soccer. In this non-professional soccer player tracking (NPSPT) challenge, single-view-based motion recording systems for continuous data collection were installed in several soccer fields, and a new benchmark dataset was collected. The dataset consists of 17 2-min long super-high-resolution videos with diverse game types consistently labeled across time, covering almost all possible situations for multi-player detection and tracking in real games. A comprehensive evaluation was conducted on the state-of-the-art multi-object-Tracking (MOT) systems, revealing insights into player tracking in real games. Our challenge introduces a new dimension for researchers in the player recognition field and will be beneficial to further studies.
APA, Harvard, Vancouver, ISO, and other styles
50

Tong, Kit-Lun, Kun-Ru Wu, and Yu-Chee Tseng. "The Device–Object Pairing Problem: Matching IoT Devices with Video Objects in a Multi-Camera Environment." Sensors 21, no. 16 (August 17, 2021): 5518. http://dx.doi.org/10.3390/s21165518.

Full text
Abstract:
IoT technologies enable millions of devices to transmit their sensor data to the external world. The device–object pairing problem arises when a group of Internet of Things is concurrently tracked by cameras and sensors. While cameras view these things as visual “objects”, these things which are equipped with “sensing devices” also continuously report their status. The challenge is that when visualizing these things on videos, their status needs to be placed properly on the screen. This requires correctly pairing visual objects with their sensing devices. There are many real-life examples. Recognizing a vehicle in videos does not imply that we can read its pedometer and fuel meter inside. Recognizing a pet on screen does not mean that we can correctly read its necklace data. In more critical ICU environments, visualizing all patients and showing their physiological signals on screen would greatly relieve nurses’ burdens. The barrier behind this is that the camera may see an object but not be able to see its carried device, not to mention its sensor readings. This paper addresses the device–object pairing problem and presents a multi-camera, multi-IoT device system that enables visualizing a group of people together with their wearable devices’ data and demonstrating the ability to recover the missing bounding box.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography