Littérature scientifique sur le sujet « Video Vision Transformer »
Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres
Consultez les listes thématiques d’articles de revues, de livres, de thèses, de rapports de conférences et d’autres sources académiques sur le sujet « Video Vision Transformer ».
À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.
Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.
Articles de revues sur le sujet "Video Vision Transformer"
Naikwadi, Sanket Shashikant. « Video Summarization Using Vision and Language Transformer Models ». International Journal of Research Publication and Reviews 6, no 6 (janvier 2025) : 5217–21. https://doi.org/10.55248/gengpi.6.0125.0654.
Texte intégralMoutik, Oumaima, Hiba Sekkat, Smail Tigani, Abdellah Chehri, Rachid Saadane, Taha Ait Tchakoucht et Anand Paul. « Convolutional Neural Networks or Vision Transformers : Who Will Win the Race for Action Recognitions in Visual Data ? » Sensors 23, no 2 (9 janvier 2023) : 734. http://dx.doi.org/10.3390/s23020734.
Texte intégralYuan, Hongchun, Zhenyu Cai, Hui Zhou, Yue Wang et Xiangzhi Chen. « TransAnomaly : Video Anomaly Detection Using Video Vision Transformer ». IEEE Access 9 (2021) : 123977–86. http://dx.doi.org/10.1109/access.2021.3109102.
Texte intégralSarraf, Saman, et Milton Kabia. « Optimal Topology of Vision Transformer for Real-Time Video Action Recognition in an End-To-End Cloud Solution ». Machine Learning and Knowledge Extraction 5, no 4 (29 septembre 2023) : 1320–39. http://dx.doi.org/10.3390/make5040067.
Texte intégralZhao, Hong, Zhiwen Chen, Lan Guo et Zeyu Han. « Video captioning based on vision transformer and reinforcement learning ». PeerJ Computer Science 8 (16 mars 2022) : e916. http://dx.doi.org/10.7717/peerj-cs.916.
Texte intégralIm, Heeju, et Yong Suk Choi. « A Full Transformer Video Captioning Model via Vision Transformer ». KIISE Transactions on Computing Practices 29, no 8 (31 août 2023) : 378–83. http://dx.doi.org/10.5626/ktcp.2023.29.8.378.
Texte intégralUgile, Tukaram, et Dr Nilesh Uke. « TRANSFORMER ARCHITECTURES FOR COMPUTER VISION : A COMPREHENSIVE REVIEW AND FUTURE RESEARCH DIRECTIONS ». Journal of Dynamics and Control 9, no 3 (15 mars 2025) : 70–79. https://doi.org/10.71058/jodac.v9i3005.
Texte intégralWu, Pengfei, Le Wang, Sanping Zhou, Gang Hua et Changyin Sun. « Temporal Correlation Vision Transformer for Video Person Re-Identification ». Proceedings of the AAAI Conference on Artificial Intelligence 38, no 6 (24 mars 2024) : 6083–91. http://dx.doi.org/10.1609/aaai.v38i6.28424.
Texte intégralJin, Yanxiu, et Rulin Ma. « Applications of transformers in computer vision ». Applied and Computational Engineering 16, no 1 (23 octobre 2023) : 234–41. http://dx.doi.org/10.54254/2755-2721/16/20230898.
Texte intégralPei, Pengfei, Xianfeng Zhao, Jinchuan Li, Yun Cao et Xuyuan Lai. « Vision Transformer-Based Video Hashing Retrieval for Tracing the Source of Fake Videos ». Security and Communication Networks 2023 (28 juin 2023) : 1–16. http://dx.doi.org/10.1155/2023/5349392.
Texte intégralThèses sur le sujet "Video Vision Transformer"
Zhang, Yujing. « Deep learning-assisted video list decoding in error-prone video transmission systems ». Electronic Thesis or Diss., Valenciennes, Université Polytechnique Hauts-de-France, 2024. http://www.theses.fr/2024UPHF0028.
Texte intégralIn recent years, video applications have developed rapidly. At the same time, the video quality experience has improved considerably with the advent of HD video and the emergence of 4K content. As a result, video streams tend to represent a larger amount of data. To reduce the size of these video streams, new video compression solutions such as HEVC have been developed.However, transmission errors that may occur over networks can cause unwanted visual artifacts that significantly degrade the user experience. Various approaches have been proposed in the literature to find efficient and low-complexity solutions to repair video packets containing binary errors, thus avoiding costly retransmission that is incompatible with the low latency constraints of many emerging applications (immersive video, tele-operation). Error correction based on cyclic redundancy check (CRC) is a promising approach that uses readily available information without throughput overhead. However, in practice it can only correct a limited number of errors. Depending on the generating polynomial used, the size of the packets and the maximum number of errors considered, this method can lead not to a single corrected packet but rather to a list of possibly corrected packets. In this case, list decoding becomes relevant in combination with CRC-based error correction as well as methods exploiting information on the reliability of the received bits. However, this has disadvantages in terms of selection of candidate videos. Following the generation of ranked candidates during the state-of-the-art list decoding process, the final selection often considers the first valid candidate in the final list as the reconstructed video. However, this simple selection is arbitrary and not optimal, the candidate video sequence at the top of the list is not necessarily the one which presents the best visual quality. It is therefore necessary to develop a new method to automatically select the video with the highest quality from the list of candidates.We propose to select the best candidate based on the visual quality determined by a deep learning (DL) system. Considering that distortions will be assessed on each frame, we consider image quality assessment rather than video quality assessment. More specifically, each candidate undergoes processing by a reference-free image quality assessment (IQA) method based on deep learning to obtain a score. Subsequently, the system selects the candidate with the highest IQA score. To do this, our system evaluates the quality of videos subject to transmission errors without eliminating lost packets or concealing lost regions. Distortions caused by transmission errors differ from those accounted for by traditional visual quality measures, which typically deal with global, uniform image distortions. Thus, these metrics fail to distinguish the repaired version from different corrupted video versions when local, non-uniform errors occur. Our approach revisits and optimizes the classic list decoding technique by associating it with a CNN architecture first, then with a Transformer to evaluate the visual quality and identify the best candidate. It is unprecedented and offers excellent performance. In particular, we show that when transmission errors occur within an intra frame, our CNN and Transformer-based architectures achieve 100% decision accuracy. For errors in an inter frame, the accuracy is 93% and 95%, respectively
Filali, razzouki Anas. « Deep learning-based video face-based digital markers for early detection and analysis of Parkinson disease ». Electronic Thesis or Diss., Institut polytechnique de Paris, 2025. http://www.theses.fr/2025IPPAS002.
Texte intégralThis thesis aims to develop robust digital biomarkers for early detection of Parkinson's disease (PD) by analyzing facial videos to identify changes associated with hypomimia. In this context, we introduce new contributions to the state of the art: one based on shallow machine learning and the other on deep learning.The first method employs machine learning models that use manually extracted facial features, particularly derivatives of facial action units (AUs). These models incorporate interpretability mechanisms that explain their decision-making process for stakeholders, highlighting the most distinctive facial features for PD. We examine the influence of biological sex on these digital biomarkers, compare them against neuroimaging data and clinical scores, and use them to predict PD severity.The second method leverages deep learning to automatically extract features from raw facial videos and optical flow using foundational models based on Video Vision Transformers. To address the limited training data, we propose advanced adaptive transfer learning techniques, utilizing foundational models trained on large-scale video classification datasets. Additionally, we integrate interpretability mechanisms to clarify the relationship between automatically extracted features and manually extracted facial AUs, enhancing the comprehensibility of the model's decisions.Finally, our generated facial features are derived from both cross-sectional and longitudinal data, which provides a significant advantage over existing work. We use these recordings to analyze the progression of hypomimia over time with these digital markers, and its correlation with the progression of clinical scores.Combining these two approaches allows for a classification AUC (Area Under the Curve) of over 90%, demonstrating the efficacy of machine learning and deep learning models in detecting hypomimia in early-stage PD patients through facial videos. This research could enable continuous monitoring of hypomimia outside hospital settings via telemedicine
Cedernaes, Erasmus. « Runway detection in LWIR video : Real time image processing and presentation of sensor data ». Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-300690.
Texte intégralSaravi, Sara. « Use of Coherent Point Drift in computer vision applications ». Thesis, Loughborough University, 2013. https://dspace.lboro.ac.uk/2134/12548.
Texte intégralLeoputra, Wilson Suryajaya. « Video foreground extraction for mobile camera platforms ». Thesis, Curtin University, 2009. http://hdl.handle.net/20.500.11937/1384.
Texte intégralAli, Abid. « Analyse vidéo à l'aide de réseaux de neurones profonds : une application pour l'autisme ». Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4066.
Texte intégralUnderstanding actions in videos is a crucial element of computer vision with significant implications across various fields. As our dependence on visual data grows, comprehending and interpreting human actions in videos becomes essential for advancing technologies in surveillance, healthcare, autonomous systems, and human-computer interaction. The accurate interpretation of actions in videos is fundamental for creating intelligent systems that can effectively navigate and respond to the complexities of the real world. In this context, advances in action understanding push the boundaries of computer vision and play a crucial role in shaping the landscape of cutting-edge applications that impact our daily lives. Computer vision has made significant progress with the rise of deep learning methods such as convolutional neural networks (CNNs) pushing the boundaries of computer vision and enabling the computer vision community to advance in many domains, including image segmentation, object detection, scene understanding, and more. However, video processing remains limited compared to static images. In this thesis, we focus on action understanding, dividing it into two main parts: action recognition and action detection, and their application in the medical domain for autism analysis.In this thesis, we explore the various aspects and challenges of video understanding from a general and an application-specific perspective. We then present our contributions and solutions to address these challenges. In addition, we introduce the ACTIVIS dataset, designed to diagnose autism in young children. Our work is divided into two main parts: generic modeling and applied models. Initially, we focus on adapting image models for action recognition tasks by incorporating temporal modeling using parameter-efficient fine-tuning (PEFT) techniques. We also address real-time action detection and anticipation by proposing a new joint model for action anticipation and online action detection in real-life scenarios. Furthermore, we introduce a new task called 'loose-interaction' in dyadic situations and its applications in autism analysis. Finally, we concentrate on the applied aspect of video understanding by proposing an action recognition model for repetitive behaviors in videos of autistic individuals. We conclude by proposing a weakly-supervised method to estimate the severity score of autistic children in long videos
Burger, Thomas. « Reconnaissance automatique des gestes de la langue française parlée complétée ». Phd thesis, Grenoble INPG, 2007. http://tel.archives-ouvertes.fr/tel-00203360.
Texte intégralLivres sur le sujet "Video Vision Transformer"
Korsgaard, Mathias Bonde. Music Video Transformed. Sous la direction de John Richardson, Claudia Gorbman et Carol Vernallis. Oxford University Press, 2013. http://dx.doi.org/10.1093/oxfordhb/9780199733866.013.015.
Texte intégralChapitres de livres sur le sujet "Video Vision Transformer"
Gabeur, Valentin, Chen Sun, Karteek Alahari et Cordelia Schmid. « Multi-modal Transformer for Video Retrieval ». Dans Computer Vision – ECCV 2020, 214–29. Cham : Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58548-8_13.
Texte intégralKim, Hannah Halin, Shuzhi Yu, Shuai Yuan et Carlo Tomasi. « Cross-Attention Transformer for Video Interpolation ». Dans Computer Vision – ACCV 2022 Workshops, 325–42. Cham : Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-27066-6_23.
Texte intégralKim, Tae Hyun, Mehdi S. M. Sajjadi, Michael Hirsch et Bernhard Schölkopf. « Spatio-Temporal Transformer Network for Video Restoration ». Dans Computer Vision – ECCV 2018, 111–27. Cham : Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01219-9_7.
Texte intégralXue, Tong, Qianrui Wang, Xinyi Huang et Dengshi Li. « Self-guided Transformer for Video Super-Resolution ». Dans Pattern Recognition and Computer Vision, 186–98. Singapore : Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8549-4_16.
Texte intégralLi, Zutong, et Lei Yang. « DCVQE : A Hierarchical Transformer for Video Quality Assessment ». Dans Computer Vision – ACCV 2022, 398–416. Cham : Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26316-3_24.
Texte intégralCourant, Robin, Maika Edberg, Nicolas Dufour et Vicky Kalogeiton. « Transformers and Visual Transformers ». Dans Machine Learning for Brain Disorders, 193–229. New York, NY : Springer US, 2012. http://dx.doi.org/10.1007/978-1-0716-3195-9_6.
Texte intégralHuo, Shuwei, Yuan Zhou et Haiyang Wang. « YFormer : A New Transformer Architecture for Video-Query Based Video Moment Retrieval ». Dans Pattern Recognition and Computer Vision, 638–50. Cham : Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-18913-5_49.
Texte intégralLi, Li, Liansheng Zhuang, Shenghua Gao et Shafei Wang. « HaViT : Hybrid-Attention Based Vision Transformer for Video Classification ». Dans Computer Vision – ACCV 2022, 502–17. Cham : Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26316-3_30.
Texte intégralZhang, Hui, Jiewen Yang, Xingbo Dong, Xingguo Lv, Wei Jia, Zhe Jin et Xuejun Li. « A Video Face Recognition Leveraging Temporal Information Based on Vision Transformer ». Dans Pattern Recognition and Computer Vision, 29–43. Singapore : Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8469-5_3.
Texte intégralWu, Jinlin, Lingxiao He, Wu Liu, Yang Yang, Zhen Lei, Tao Mei et Stan Z. Li. « CAViT : Contextual Alignment Vision Transformer for Video Object Re-identification ». Dans Lecture Notes in Computer Science, 549–66. Cham : Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19781-9_32.
Texte intégralActes de conférences sur le sujet "Video Vision Transformer"
Kobayashi, Takumi, et Masataka Seo. « Efficient Compression Method in Video Reconstruction Using Video Vision Transformer ». Dans 2024 IEEE 13th Global Conference on Consumer Electronics (GCCE), 724–25. IEEE, 2024. https://doi.org/10.1109/gcce62371.2024.10760444.
Texte intégralYokota, Haruto, Mert Bozkurtlar, Benjamin Yen, Katsutoshi Itoyama, Kenji Nishida et Kazuhiro Nakadai. « A Video Vision Transformer for Sound Source Localization ». Dans 2024 32nd European Signal Processing Conference (EUSIPCO), 106–10. IEEE, 2024. http://dx.doi.org/10.23919/eusipco63174.2024.10715427.
Texte intégralOjaswee, R. Sreemathy, Mousami Turuk, Jayashree Jagdale et Mohammad Anish. « Indian Sign Language Recognition Using Video Vision Transformer ». Dans 2024 3rd International Conference for Advancement in Technology (ICONAT), 1–7. IEEE, 2024. https://doi.org/10.1109/iconat61936.2024.10774678.
Texte intégralThuan, Pham Minh, Bui Thu Lam et Pham Duy Trung. « Spatial Vision Transformer : A Novel Approach to Deepfake Video Detection ». Dans 2024 1st International Conference On Cryptography And Information Security (VCRIS), 1–6. IEEE, 2024. https://doi.org/10.1109/vcris63677.2024.10813391.
Texte intégralKumari, Supriya, Prince Kumar, Pooja Verma, Rajitha B et Sarsij Tripathi. « Hybrid Vision Transformer and Convolutional Neural Network for Sports Video Classification ». Dans 2024 International Conference on Intelligent Computing and Emerging Communication Technologies (ICEC), 1–5. IEEE, 2024. https://doi.org/10.1109/icec59683.2024.10837289.
Texte intégralIsogawa, Junya, Fumihiko Sakaue et Jun Sato. « Simultaneous Estimation of Driving Intentions for Multiple Vehicles Using Video Transformer ». Dans 20th International Conference on Computer Vision Theory and Applications, 471–77. SCITEPRESS - Science and Technology Publications, 2025. https://doi.org/10.5220/0013232100003912.
Texte intégralGupta, Anisha, et Vidit Kumar. « A Hybrid U-Net and Vision Transformer approach for Video Anomaly detection ». Dans 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1–6. IEEE, 2024. http://dx.doi.org/10.1109/icccnt61001.2024.10725860.
Texte intégralAnsari, Khustar, et Priyanka Srivastava. « Hybrid Attention Vision Transformer-based Deep Learning Model for Video Caption Generation ». Dans 2025 International Conference on Electronics and Renewable Systems (ICEARS), 1238–45. IEEE, 2025. https://doi.org/10.1109/icears64219.2025.10940922.
Texte intégralZhou, Xingyu, Leheng Zhang, Xiaorui Zhao, Keze Wang, Leida Li et Shuhang Gu. « Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention ». Dans 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 25399–408. IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.02400.
Texte intégralChoi, Joonmyung, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi et Hyunwoo J. Kim. « vid-TLDR : Training Free Token merging for Light-Weight Video Transformer ». Dans 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18771–81. IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.01776.
Texte intégral