Literatura académica sobre el tema "Deep Video Representations"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Deep Video Representations".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Deep Video Representations"

1

Feichtenhofer, Christoph, Axel Pinz, Richard P. Wildes, and Andrew Zisserman. "Deep Insights into Convolutional Networks for Video Recognition." International Journal of Computer Vision 128, no. 2 (2019): 420–37. http://dx.doi.org/10.1007/s11263-019-01225-w.

Texto completo
Resumen
Abstract As the success of deep models has led to their deployment in all areas of computer vision, it is increasingly important to understand how these representations work and what they are capturing. In this paper, we shed light on deep spatiotemporal representations by visualizing the internal representation of models that have been trained to recognize actions in video. We visualize multiple two-stream architectures to show that local detectors for appearance and motion objects arise to form distributed representations for recognizing human actions. Key observations include the following.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Pandeya, Yagya Raj, Bhuwan Bhattarai, and Joonwhoan Lee. "Deep-Learning-Based Multimodal Emotion Classification for Music Videos." Sensors 21, no. 14 (2021): 4927. http://dx.doi.org/10.3390/s21144927.

Texto completo
Resumen
Music videos contain a great deal of visual and acoustic information. Each information source within a music video influences the emotions conveyed through the audio and video, suggesting that only a multimodal approach is capable of achieving efficient affective computing. This paper presents an affective computing system that relies on music, video, and facial expression cues, making it useful for emotional analysis. We applied the audio–video information exchange and boosting methods to regularize the training process and reduced the computational costs by using a separable convolution stra
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Ljubešić, Nikola. "‟Deep lexicography” – Fad or Opportunity?" Rasprave Instituta za hrvatski jezik i jezikoslovlje 46, no. 2 (2020): 839–52. http://dx.doi.org/10.31724/rihjj.46.2.21.

Texto completo
Resumen
In recent years, we are witnessing staggering improvements in various semantic data processing tasks due to the developments in the area of deep learning, ranging from image and video processing to speech processing, and natural language understanding. In this paper, we discuss the opportunities and challenges that these developments pose for the area of electronic lexicography. We primarily focus on the concept of representation learning of the basic elements of language, namely words, and the applicability of these word representations to lexicography. We first discuss well-known approaches
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Kumar, Vidit, Vikas Tripathi, and Bhaskar Pant. "Learning Unsupervised Visual Representations using 3D Convolutional Autoencoder with Temporal Contrastive Modeling for Video Retrieval." International Journal of Mathematical, Engineering and Management Sciences 7, no. 2 (2022): 272–87. http://dx.doi.org/10.33889/ijmems.2022.7.2.018.

Texto completo
Resumen
The rapid growth of tag-free user-generated videos (on the Internet), surgical recorded videos, and surveillance videos has necessitated the need for effective content-based video retrieval systems. Earlier methods for video representations are based on hand-crafted, which hardly performed well on the video retrieval tasks. Subsequently, deep learning methods have successfully demonstrated their effectiveness in both image and video-related tasks, but at the cost of creating massively labeled datasets. Thus, the economic solution is to use freely available unlabeled web videos for representati
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Vihlman, Mikko, and Arto Visala. "Optical Flow in Deep Visual Tracking." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 12112–19. http://dx.doi.org/10.1609/aaai.v34i07.6890.

Texto completo
Resumen
Single-target tracking of generic objects is a difficult task since a trained tracker is given information present only in the first frame of a video. In recent years, increasingly many trackers have been based on deep neural networks that learn generic features relevant for tracking. This paper argues that deep architectures are often fit to learn implicit representations of optical flow. Optical flow is intuitively useful for tracking, but most deep trackers must learn it implicitly. This paper is among the first to study the role of optical flow in deep visual tracking. The architecture of
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Rouast, Philipp V., and Marc T. P. Adam. "Learning Deep Representations for Video-Based Intake Gesture Detection." IEEE Journal of Biomedical and Health Informatics 24, no. 6 (2020): 1727–37. http://dx.doi.org/10.1109/jbhi.2019.2942845.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Li, Jialu, Aishwarya Padmakumar, Gaurav Sukhatme, and Mohit Bansal. "VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (2024): 18517–26. http://dx.doi.org/10.1609/aaai.v38i17.29813.

Texto completo
Resumen
Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions. The performance of existing VLN methods is limited by insufficient diversity in navigation environments and limited training data. To address these issues, we propose VLN-Video, which utilizes the diverse outdoor environments present in driving videos in multiple cities in the U.S. augmented with automatically generated navigation instructions and actions to improve outdoor VLN performance. VLN-Video combines the best of intuitive classica
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Hu, Yueyue, Shiliang Sun, Xin Xu, and Jing Zhao. "Multi-View Deep Attention Network for Reinforcement Learning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (2020): 13811–12. http://dx.doi.org/10.1609/aaai.v34i10.7177.

Texto completo
Resumen
The representation approximated by a single deep network is usually limited for reinforcement learning agents. We propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning task for the first time. The proposed model approximates a set of strategies from multiple representations and combines these strategies based on attention mechanisms to provide a comprehensive strategy for a single-agent. Experimental results on eight Atari video games show that the MvDAN has effective competitive performance than single-vi
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Dong, Zhen, Chenchen Jing, Mingtao Pei, and Yunde Jia. "Deep CNN based binary hash video representations for face retrieval." Pattern Recognition 81 (September 2018): 357–69. http://dx.doi.org/10.1016/j.patcog.2018.04.014.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Psallidas, Theodoros, and Evaggelos Spyrou. "Video Summarization Based on Feature Fusion and Data Augmentation." Computers 12, no. 9 (2023): 186. http://dx.doi.org/10.3390/computers12090186.

Texto completo
Resumen
During the last few years, several technological advances have led to an increase in the creation and consumption of audiovisual multimedia content. Users are overexposed to videos via several social media or video sharing websites and mobile phone applications. For efficient browsing, searching, and navigation across several multimedia collections and repositories, e.g., for finding videos that are relevant to a particular topic or interest, this ever-increasing content should be efficiently described by informative yet concise content representations. A common solution to this problem is the
Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "Deep Video Representations"

1

Yang, Yang. "Learning Hierarchical Representations for Video Analysis Using Deep Learning." Doctoral diss., University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5892.

Texto completo
Resumen
With the exponential growth of the digital data, video content analysis (e.g., action, event recognition) has been drawing increasing attention from computer vision researchers. Effective modeling of the objects, scenes, and motions is critical for visual understanding. Recently there has been a growing interest in the bio-inspired deep learning models, which has shown impressive results in speech and object recognition. The deep learning models are formed by the composition of multiple non-linear transformations of the data, with the goal of yielding more abstract and ultimately more useful r
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Sudhakaran, Swathikiran. "Deep Neural Architectures for Video Representation Learning." Doctoral thesis, Università degli studi di Trento, 2019. https://hdl.handle.net/11572/369191.

Texto completo
Resumen
Automated analysis of videos for content understanding is one of the most challenging and well researched areas in computer vision and multimedia. This thesis addresses the problem of video content understanding in the context of action recognition. The major challenge faced by this research problem is the variations of the spatio-temporal patterns that constitute each action category and the difficulty in generating a succinct representation encapsulating these patterns. This thesis considers two important aspects of videos for addressing this problem: (1) a video is a sequence of images with
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Sudhakaran, Swathikiran. "Deep Neural Architectures for Video Representation Learning." Doctoral thesis, University of Trento, 2019. http://eprints-phd.biblio.unitn.it/3731/1/swathi_thesis_rev1.pdf.

Texto completo
Resumen
Automated analysis of videos for content understanding is one of the most challenging and well researched areas in computer vision and multimedia. This thesis addresses the problem of video content understanding in the context of action recognition. The major challenge faced by this research problem is the variations of the spatio-temporal patterns that constitute each action category and the difficulty in generating a succinct representation encapsulating these patterns. This thesis considers two important aspects of videos for addressing this problem: (1) a video is a sequence of images with
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Sun, Shuyang. "Designing Motion Representation in Videos." Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/19724.

Texto completo
Resumen
Motion representation plays a vital role in the vision-based human action recognition in videos. Generally, the information of a video could be divided into spatial information and temporal information. While the spatial information could be easily described by the RGB images, the design of the motion representation is yet a challenging problem. In order to design a motion representation that is efficient and effective, we design the feature according to two principles. First, to guarantee the robustness, the temporal information should be highly related to the informative modalities, e.g., th
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Mazari, Ahmed. "Apprentissage profond pour la reconnaissance d’actions en vidéos." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS171.

Texto completo
Resumen
De nos jours, les contenus vidéos sont omniprésents grâce à Internet et les smartphones, ainsi que les médias sociaux. De nombreuses applications de la vie quotidienne, telles que la vidéo surveillance et la description de contenus vidéos, ainsi que la compréhension de scènes visuelles, nécessitent des technologies sophistiquées pour traiter les données vidéos. Il devient nécessaire de développer des moyens automatiques pour analyser et interpréter la grande quantité de données vidéo disponibles. Dans cette thèse, nous nous intéressons à la reconnaissance d'actions dans les vidéos, c.a.d au pr
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

"Video2Vec: Learning Semantic Spatio-Temporal Embedding for Video Representations." Master's thesis, 2016. http://hdl.handle.net/2286/R.I.40765.

Texto completo
Resumen
abstract: High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos. Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

(7486115), Gagandeep Singh Khanuja. "A STUDY OF REAL TIME SEARCH IN FLOOD SCENES FROM UAV VIDEOS USING DEEP LEARNING TECHNIQUES." Thesis, 2019.

Buscar texto completo
Resumen
<div>Following a natural disaster, one of the most important facet that influence a persons chances of survival/being found out is the time with which they are rescued. Traditional means of search operations involving dogs, ground robots, humanitarian intervention; are time intensive and can be a major bottleneck in search operations. The main aim of these operations is to rescue victims without critical delay in the shortest time possible which can be realized in real-time by using UAVs. With advancements in computational devices and the ability to learn from complex data, deep learning can b
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Souček, Tomáš. "Detekce střihů a vyhledávání známých scén ve videu s pomocí metod hlubokého učení." Master's thesis, 2020. http://www.nusl.cz/ntk/nusl-434967.

Texto completo
Resumen
Video retrieval represents a challenging problem with many caveats and sub-problems. This thesis focuses on two of these sub-problems, namely shot transition detection and text-based search. In the case of shot detection, many solutions have been proposed over the last decades. Recently, deep learning-based approaches improved the accuracy of shot transition detection using 3D convolutional architectures and artificially created training data, but one hundred percent accuracy is still an unreachable ideal. In this thesis we present a deep network for shot transition detection TransNet V2 that
Los estilos APA, Harvard, Vancouver, ISO, etc.

Libros sobre el tema "Deep Video Representations"

1

Aguayo, Angela J. Documentary Resistance. Oxford University Press, 2019. http://dx.doi.org/10.1093/oso/9780190676216.001.0001.

Texto completo
Resumen
The potential of documentary moving images to foster democratic exchange has been percolating within media production culture for the last century, and now, with mobile cameras at our fingertips and broadcasts circulating through unpredictable social networks, the documentary impulse is coming into its own as a political force of social change. The exploding reach and power of audio and video are multiplying documentary modes of communication. Once considered an outsider media practice, documentary is finding mass appeal in the allure of moving images, collecting participatory audiences that c
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Anderson, Crystal S. Soul in Seoul. University Press of Mississippi, 2020. http://dx.doi.org/10.14325/mississippi/9781496830098.001.0001.

Texto completo
Resumen
Soul in Seoul: African American Popular Music and K-pop examines how K-pop cites musical and performative elements of Black popular music culture as well as the ways that fans outside of Korea understand these citations. K-pop represents a hybridized mode of Korean popular music that emerged in the 1990s with global aspirations. Its hybridity combines musical elements from Korean and foreign cultures, particularly rhythm and blues-based genres (R&amp;B) of African American popular music. Korean pop, R&amp;B and hip-hop solo artists and groups engage in citational practices by simultaneously em
Los estilos APA, Harvard, Vancouver, ISO, etc.

Capítulos de libros sobre el tema "Deep Video Representations"

1

Loban, Rhett. "Designing to produce deep representations." In Embedding Culture into Video Games and Game Design. Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003276289-10.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Yao, Yuan, Zhiyuan Liu, Yankai Lin, and Maosong Sun. "Cross-Modal Representation Learning." In Representation Learning for Natural Language Processing. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-1600-9_7.

Texto completo
Resumen
AbstractCross-modal representation learning is an essential part of representation learning, which aims to learn semantic representations for different modalities including text, audio, image and video, etc., and their connections. In this chapter, we introduce the development of cross-modal representation learning from shallow to deep, and from respective to unified in terms of model architectures and learning mechanisms for different modalities and tasks. After that, we review how cross-modal capabilities can contribute to complex real-world applications.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Mao, Feng, Xiang Wu, Hui Xue, and Rong Zhang. "Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network." In Lecture Notes in Computer Science. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-11018-5_24.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Becerra-Riera, Fabiola, Annette Morales-González, and Heydi Méndez-Vázquez. "Exploring Local Deep Representations for Facial Gender Classification in Videos." In Progress in Artificial Intelligence and Pattern Recognition. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01132-1_12.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Zhao, Kemeng, Liangrui Peng, Ning Ding, Gang Yao, Pei Tang, and Shengjin Wang. "Deep Representation Learning for License Plate Recognition in Low Quality Video Images." In Advances in Visual Computing. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-47966-3_16.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Chen, Yixiong, Chunhui Zhang, Li Liu, et al. "USCL: Pretraining Deep Ultrasound Image Diagnosis Model Through Video Contrastive Representation Learning." In Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87237-3_60.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Dhurgadevi, M., D. Vimal Kumar, R. Senthilkumar, and K. Gunasekaran. "Detection of Video Anomaly in Public With Deep Learning Algorithm." In Advances in Psychology, Mental Health, and Behavioral Studies. IGI Global, 2024. http://dx.doi.org/10.4018/979-8-3693-4143-8.ch004.

Texto completo
Resumen
For traffic control and public safety, predicting the movement of people is crucial. The presented scheme entails the development of a wider network that can better satisfy created synthetic images by connecting spatial representations to temporal ones. The authors exclusively use the frames from those occurrences to create the dense optical flow for their corresponding normal events. In order to eliminate false-positive detection findings, they determine the local pixel reconstruction error. This particle prediction model and a likelihood model for giving these particles weights are both suggested. These models effectively use the variable-sized cell structure to produce sceneries with variable-sized sub-regions. It also successfully extracts and utilizes the video frame's size, motion, and position information. On the UCSD and LIVE datasets, the proposed framework is evaluated with the most recent algorithms reported in the literature. With a significantly shorter processing time, the suggested technique surpasses state-of-the-art techniques in relation to decreased equal error rate .
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Asma, Stephen T. "Drama In The Diorama: The Confederation & Art and Science." In Stuffed Animals & pickled Heads. Oxford University PressNew York, NY, 2001. http://dx.doi.org/10.1093/oso/9780195130508.003.0007.

Texto completo
Resumen
Abstract The Museums That We’ve Studied throughout this journey reveal the tremendous diversity of goals and motives for collecting and displaying elements of the natural world. Yet underneath all these various constructions of nature, there has been a continuous dialogue between image-making activities and knowledge-producing activities. Unlike texts, natural history museums are inherently aesthetic representations of science in particular and conceptual ideas in general. The fact that a roulette wheel at the Field could touch the central nerves of our deep metaphysical convictions is an indication of a museum’s epistemic potential. After spending long stretches in many natural history museums, one begins to see that a display’s potential for education and transformation is largely a function of its artistic, nondiscursive character. Three-dimensional representations of nature (dioramas), two-dimensional and three-dimensional representations of concepts (such as the roulette wheel), and visual images generally are not just candy coatings on the real educational process of textual information transmission. This chapter explores how and why visual communication works on museum visitors. And this requires an examination of the more general issue of how images themselves can be pedagogical, an issue that extends from da Vinci’s anatomy drawings to the latest video edutainment technology. These issues lead to a survey of some of the most recent trends in museology, followed by some reflections on the museum at the millennium.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Verma, Gyanendra K. "Emotions Modelling in 3D Space." In Multimodal Affective Computing: Affective Information Representation, Modelling, and Analysis. BENTHAM SCIENCE PUBLISHERS, 2023. http://dx.doi.org/10.2174/9789815124453123010013.

Texto completo
Resumen
In this study, we have discussed emotion representation in two and three.dimensional space. The three-dimensional space is based on the three emotion primitives, i.e., valence, arousal, and dominance. The multimodal cues used in this study are EEG, Physiological signals, and video (under limitations). Due to the limited emotional content in videos from the DEAP database, we have considered only three classes of emotions, i.e., happy, sad, and terrible. The wavelet transforms, a classical transform, were employed for multi-resolution analysis of signals to extract features. We have evaluated the proposed emotion model with standard multimodal datasets, DEAP. The experimental results show that SVM and MLP can predict emotions in single and multimodal cues.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Nandal, Priyanka. "Motion Imitation for Monocular Videos." In Examining the Impact of Deep Learning and IoT on Multi-Industry Applications. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-7511-6.ch008.

Texto completo
Resumen
This work represents a simple method for motion transfer (i.e., given a source video of a subject [person] performing some movements or in motion, that movement/motion is transferred to amateur target in different motion). The pose is used as an intermediate representation to perform this translation. To transfer the motion of the source subject to the target subject, the pose is extracted from the source subject, and then the target subject is generated by applying the learned pose to-appearance mapping. To perform this translation, the video is considered as a set of images consisting of all the frames. Generative adversarial networks (GANs) are used to transfer the motion from source subject to the target subject. GANs are an evolving field of deep learning.
Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Deep Video Representations"

1

Morere, Olivier, Hanlin Goh, Antoine Veillard, Vijay Chandrasekhar, and Jie Lin. "Co-regularized deep representations for video summarization." In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015. http://dx.doi.org/10.1109/icip.2015.7351387.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Yu, Feiwu, Xinxiao Wu, Yuchao Sun, and Lixin Duan. "Exploiting Images for Video Recognition with Hierarchical Generative Adversarial Networks." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/154.

Texto completo
Resumen
Existing deep learning methods of video recognition usually require a large number of labeled videos for training. But for a new task, videos are often unlabeled and it is also time-consuming and labor-intensive to annotate them. Instead of human annotation, we try to make use of existing fully labeled images to help recognize those videos. However, due to the problem of domain shifts and heterogeneous feature representations, the performance of classifiers trained on images may be dramatically degraded for video recognition tasks. In this paper, we propose a novel method, called Hierarchical
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Pernici, Federico, Federico Bartoli, Matteo Bruni, and Alberto Del Bimbo. "Memory Based Online Learning of Deep Representations from Video Streams." In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018. http://dx.doi.org/10.1109/cvpr.2018.00247.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Jung, Ilchae, Minji Kim, Eunhyeok Park, and Bohyung Han. "Online Hybrid Lightweight Representations Learning: Its Application to Visual Tracking." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/140.

Texto completo
Resumen
This paper presents a novel hybrid representation learning framework for streaming data, where an image frame in a video is modeled by an ensemble of two distinct deep neural networks; one is a low-bit quantized network and the other is a lightweight full-precision network. The former learns coarse primary information with low cost while the latter conveys residual information for high fidelity to original representations. The proposed parallel architecture is effective to maintain complementary information since fixed-point arithmetic can be utilized in the quantized network and the lightweig
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Garcia-Gonzalez, Jorge, Rafael M. Luque-Baena, Juan M. Ortiz-de-Lazcano-Lobato, and Ezequiel Lopez-Rubio. "Moving Object Detection in Noisy Video Sequences Using Deep Convolutional Disentangled Representations." In 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022. http://dx.doi.org/10.1109/icip46576.2022.9897305.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Parchami, Mostafa, Saman Bashbaghi, Eric Granger, and Saif Sayed. "Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition." In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017. http://dx.doi.org/10.1109/avss.2017.8078553.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Bueno-Benito, Elena, Biel Tura, and Mariella Dimiccoli. "Leveraging Triplet Loss for Unsupervised Action Segmentation." In LatinX in AI at Computer Vision and Pattern Recognition Conference 2023. Journal of LatinX in AI Research, 2023. http://dx.doi.org/10.52591/lxai202306185.

Texto completo
Resumen
In this paper, we propose a novel fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself, without requiring any training data. Our method is a deep metric learning approach rooted in a shallow network with a triplet loss operating on similarity distributions and a novel triplet selection strategy that effectively models temporal and semantic priors to discover actions in the new representational space. Under these circumstances, we successfully recover temporal boundaries in the learned action representations
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Kich, Victor Augusto, Junior Costa de Jesus, Ricardo Bedin Grando, Alisson Henrique Kolling, Gabriel Vinícius Heisler, and Rodrigo da Silva Guerra. "Deep Reinforcement Learning Using a Low-Dimensional Observation Filter for Visual Complex Video Game Playing." In Anais Estendidos do Simpósio Brasileiro de Games e Entretenimento Digital. Sociedade Brasileira de Computação, 2021. http://dx.doi.org/10.5753/sbgames_estendido.2021.19659.

Texto completo
Resumen
Deep Reinforcement Learning (DRL) has produced great achievements since it was proposed, including the possibility of processing raw vision input data. However, training an agent to perform tasks based on image feedback remains a challenge. It requires the processing of large amounts of data from high-dimensional observation spaces, frame by frame, and the agent's actions are computed according to deep neural network policies, end-to-end. Image pre-processing is an effective way of reducing these high dimensional spaces, eliminating unnecessary information present in the scene, supporting the
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Fan, Tingyu, Linyao Gao, Yiling Xu, Zhu Li, and Dong Wang. "D-DPCC: Deep Dynamic Point Cloud Compression via 3D Motion Prediction." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/126.

Texto completo
Resumen
The non-uniformly distributed nature of the 3D Dynamic Point Cloud (DPC) brings significant challenges to its high-efficient inter-frame compression. This paper proposes a novel 3D sparse convolution-based Deep Dynamic Point Cloud Compression (D-DPCC) network to compensate and compress the DPC geometry with 3D motion estimation and motion compensation in the feature space. In the proposed D-DPCC network, we design a Multi-scale Motion Fusion (MMF) module to accurately estimate the 3D optical flow between the feature representations of adjacent point cloud frames. Specifically, we utilize a 3D
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Li, Yang, Kan Li, and Xinxin Wang. "Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/112.

Texto completo
Resumen
In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!