Gotowa bibliografia na temat „Video Vision Transformer”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Video Vision Transformer”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Artykuły w czasopismach na temat "Video Vision Transformer"
Naikwadi, Sanket Shashikant. "Video Summarization Using Vision and Language Transformer Models". International Journal of Research Publication and Reviews 6, nr 6 (styczeń 2025): 5217–21. https://doi.org/10.55248/gengpi.6.0125.0654.
Pełny tekst źródłaMoutik, Oumaima, Hiba Sekkat, Smail Tigani, Abdellah Chehri, Rachid Saadane, Taha Ait Tchakoucht i Anand Paul. "Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?" Sensors 23, nr 2 (9.01.2023): 734. http://dx.doi.org/10.3390/s23020734.
Pełny tekst źródłaYuan, Hongchun, Zhenyu Cai, Hui Zhou, Yue Wang i Xiangzhi Chen. "TransAnomaly: Video Anomaly Detection Using Video Vision Transformer". IEEE Access 9 (2021): 123977–86. http://dx.doi.org/10.1109/access.2021.3109102.
Pełny tekst źródłaSarraf, Saman, i Milton Kabia. "Optimal Topology of Vision Transformer for Real-Time Video Action Recognition in an End-To-End Cloud Solution". Machine Learning and Knowledge Extraction 5, nr 4 (29.09.2023): 1320–39. http://dx.doi.org/10.3390/make5040067.
Pełny tekst źródłaZhao, Hong, Zhiwen Chen, Lan Guo i Zeyu Han. "Video captioning based on vision transformer and reinforcement learning". PeerJ Computer Science 8 (16.03.2022): e916. http://dx.doi.org/10.7717/peerj-cs.916.
Pełny tekst źródłaIm, Heeju, i Yong Suk Choi. "A Full Transformer Video Captioning Model via Vision Transformer". KIISE Transactions on Computing Practices 29, nr 8 (31.08.2023): 378–83. http://dx.doi.org/10.5626/ktcp.2023.29.8.378.
Pełny tekst źródłaUgile, Tukaram, i Dr Nilesh Uke. "TRANSFORMER ARCHITECTURES FOR COMPUTER VISION: A COMPREHENSIVE REVIEW AND FUTURE RESEARCH DIRECTIONS". Journal of Dynamics and Control 9, nr 3 (15.03.2025): 70–79. https://doi.org/10.71058/jodac.v9i3005.
Pełny tekst źródłaWu, Pengfei, Le Wang, Sanping Zhou, Gang Hua i Changyin Sun. "Temporal Correlation Vision Transformer for Video Person Re-Identification". Proceedings of the AAAI Conference on Artificial Intelligence 38, nr 6 (24.03.2024): 6083–91. http://dx.doi.org/10.1609/aaai.v38i6.28424.
Pełny tekst źródłaJin, Yanxiu, i Rulin Ma. "Applications of transformers in computer vision". Applied and Computational Engineering 16, nr 1 (23.10.2023): 234–41. http://dx.doi.org/10.54254/2755-2721/16/20230898.
Pełny tekst źródłaPei, Pengfei, Xianfeng Zhao, Jinchuan Li, Yun Cao i Xuyuan Lai. "Vision Transformer-Based Video Hashing Retrieval for Tracing the Source of Fake Videos". Security and Communication Networks 2023 (28.06.2023): 1–16. http://dx.doi.org/10.1155/2023/5349392.
Pełny tekst źródłaRozprawy doktorskie na temat "Video Vision Transformer"
Zhang, Yujing. "Deep learning-assisted video list decoding in error-prone video transmission systems". Electronic Thesis or Diss., Valenciennes, Université Polytechnique Hauts-de-France, 2024. http://www.theses.fr/2024UPHF0028.
Pełny tekst źródłaIn recent years, video applications have developed rapidly. At the same time, the video quality experience has improved considerably with the advent of HD video and the emergence of 4K content. As a result, video streams tend to represent a larger amount of data. To reduce the size of these video streams, new video compression solutions such as HEVC have been developed.However, transmission errors that may occur over networks can cause unwanted visual artifacts that significantly degrade the user experience. Various approaches have been proposed in the literature to find efficient and low-complexity solutions to repair video packets containing binary errors, thus avoiding costly retransmission that is incompatible with the low latency constraints of many emerging applications (immersive video, tele-operation). Error correction based on cyclic redundancy check (CRC) is a promising approach that uses readily available information without throughput overhead. However, in practice it can only correct a limited number of errors. Depending on the generating polynomial used, the size of the packets and the maximum number of errors considered, this method can lead not to a single corrected packet but rather to a list of possibly corrected packets. In this case, list decoding becomes relevant in combination with CRC-based error correction as well as methods exploiting information on the reliability of the received bits. However, this has disadvantages in terms of selection of candidate videos. Following the generation of ranked candidates during the state-of-the-art list decoding process, the final selection often considers the first valid candidate in the final list as the reconstructed video. However, this simple selection is arbitrary and not optimal, the candidate video sequence at the top of the list is not necessarily the one which presents the best visual quality. It is therefore necessary to develop a new method to automatically select the video with the highest quality from the list of candidates.We propose to select the best candidate based on the visual quality determined by a deep learning (DL) system. Considering that distortions will be assessed on each frame, we consider image quality assessment rather than video quality assessment. More specifically, each candidate undergoes processing by a reference-free image quality assessment (IQA) method based on deep learning to obtain a score. Subsequently, the system selects the candidate with the highest IQA score. To do this, our system evaluates the quality of videos subject to transmission errors without eliminating lost packets or concealing lost regions. Distortions caused by transmission errors differ from those accounted for by traditional visual quality measures, which typically deal with global, uniform image distortions. Thus, these metrics fail to distinguish the repaired version from different corrupted video versions when local, non-uniform errors occur. Our approach revisits and optimizes the classic list decoding technique by associating it with a CNN architecture first, then with a Transformer to evaluate the visual quality and identify the best candidate. It is unprecedented and offers excellent performance. In particular, we show that when transmission errors occur within an intra frame, our CNN and Transformer-based architectures achieve 100% decision accuracy. For errors in an inter frame, the accuracy is 93% and 95%, respectively
Filali, razzouki Anas. "Deep learning-based video face-based digital markers for early detection and analysis of Parkinson disease". Electronic Thesis or Diss., Institut polytechnique de Paris, 2025. http://www.theses.fr/2025IPPAS002.
Pełny tekst źródłaThis thesis aims to develop robust digital biomarkers for early detection of Parkinson's disease (PD) by analyzing facial videos to identify changes associated with hypomimia. In this context, we introduce new contributions to the state of the art: one based on shallow machine learning and the other on deep learning.The first method employs machine learning models that use manually extracted facial features, particularly derivatives of facial action units (AUs). These models incorporate interpretability mechanisms that explain their decision-making process for stakeholders, highlighting the most distinctive facial features for PD. We examine the influence of biological sex on these digital biomarkers, compare them against neuroimaging data and clinical scores, and use them to predict PD severity.The second method leverages deep learning to automatically extract features from raw facial videos and optical flow using foundational models based on Video Vision Transformers. To address the limited training data, we propose advanced adaptive transfer learning techniques, utilizing foundational models trained on large-scale video classification datasets. Additionally, we integrate interpretability mechanisms to clarify the relationship between automatically extracted features and manually extracted facial AUs, enhancing the comprehensibility of the model's decisions.Finally, our generated facial features are derived from both cross-sectional and longitudinal data, which provides a significant advantage over existing work. We use these recordings to analyze the progression of hypomimia over time with these digital markers, and its correlation with the progression of clinical scores.Combining these two approaches allows for a classification AUC (Area Under the Curve) of over 90%, demonstrating the efficacy of machine learning and deep learning models in detecting hypomimia in early-stage PD patients through facial videos. This research could enable continuous monitoring of hypomimia outside hospital settings via telemedicine
Cedernaes, Erasmus. "Runway detection in LWIR video : Real time image processing and presentation of sensor data". Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-300690.
Pełny tekst źródłaSaravi, Sara. "Use of Coherent Point Drift in computer vision applications". Thesis, Loughborough University, 2013. https://dspace.lboro.ac.uk/2134/12548.
Pełny tekst źródłaLeoputra, Wilson Suryajaya. "Video foreground extraction for mobile camera platforms". Thesis, Curtin University, 2009. http://hdl.handle.net/20.500.11937/1384.
Pełny tekst źródłaAli, Abid. "Analyse vidéo à l'aide de réseaux de neurones profonds : une application pour l'autisme". Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4066.
Pełny tekst źródłaUnderstanding actions in videos is a crucial element of computer vision with significant implications across various fields. As our dependence on visual data grows, comprehending and interpreting human actions in videos becomes essential for advancing technologies in surveillance, healthcare, autonomous systems, and human-computer interaction. The accurate interpretation of actions in videos is fundamental for creating intelligent systems that can effectively navigate and respond to the complexities of the real world. In this context, advances in action understanding push the boundaries of computer vision and play a crucial role in shaping the landscape of cutting-edge applications that impact our daily lives. Computer vision has made significant progress with the rise of deep learning methods such as convolutional neural networks (CNNs) pushing the boundaries of computer vision and enabling the computer vision community to advance in many domains, including image segmentation, object detection, scene understanding, and more. However, video processing remains limited compared to static images. In this thesis, we focus on action understanding, dividing it into two main parts: action recognition and action detection, and their application in the medical domain for autism analysis.In this thesis, we explore the various aspects and challenges of video understanding from a general and an application-specific perspective. We then present our contributions and solutions to address these challenges. In addition, we introduce the ACTIVIS dataset, designed to diagnose autism in young children. Our work is divided into two main parts: generic modeling and applied models. Initially, we focus on adapting image models for action recognition tasks by incorporating temporal modeling using parameter-efficient fine-tuning (PEFT) techniques. We also address real-time action detection and anticipation by proposing a new joint model for action anticipation and online action detection in real-life scenarios. Furthermore, we introduce a new task called 'loose-interaction' in dyadic situations and its applications in autism analysis. Finally, we concentrate on the applied aspect of video understanding by proposing an action recognition model for repetitive behaviors in videos of autistic individuals. We conclude by proposing a weakly-supervised method to estimate the severity score of autistic children in long videos
Burger, Thomas. "Reconnaissance automatique des gestes de la langue française parlée complétée". Phd thesis, Grenoble INPG, 2007. http://tel.archives-ouvertes.fr/tel-00203360.
Pełny tekst źródłaKsiążki na temat "Video Vision Transformer"
Korsgaard, Mathias Bonde. Music Video Transformed. Redaktorzy John Richardson, Claudia Gorbman i Carol Vernallis. Oxford University Press, 2013. http://dx.doi.org/10.1093/oxfordhb/9780199733866.013.015.
Pełny tekst źródłaCzęści książek na temat "Video Vision Transformer"
Gabeur, Valentin, Chen Sun, Karteek Alahari i Cordelia Schmid. "Multi-modal Transformer for Video Retrieval". W Computer Vision – ECCV 2020, 214–29. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58548-8_13.
Pełny tekst źródłaKim, Hannah Halin, Shuzhi Yu, Shuai Yuan i Carlo Tomasi. "Cross-Attention Transformer for Video Interpolation". W Computer Vision – ACCV 2022 Workshops, 325–42. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-27066-6_23.
Pełny tekst źródłaKim, Tae Hyun, Mehdi S. M. Sajjadi, Michael Hirsch i Bernhard Schölkopf. "Spatio-Temporal Transformer Network for Video Restoration". W Computer Vision – ECCV 2018, 111–27. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01219-9_7.
Pełny tekst źródłaXue, Tong, Qianrui Wang, Xinyi Huang i Dengshi Li. "Self-guided Transformer for Video Super-Resolution". W Pattern Recognition and Computer Vision, 186–98. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8549-4_16.
Pełny tekst źródłaLi, Zutong, i Lei Yang. "DCVQE: A Hierarchical Transformer for Video Quality Assessment". W Computer Vision – ACCV 2022, 398–416. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26316-3_24.
Pełny tekst źródłaCourant, Robin, Maika Edberg, Nicolas Dufour i Vicky Kalogeiton. "Transformers and Visual Transformers". W Machine Learning for Brain Disorders, 193–229. New York, NY: Springer US, 2012. http://dx.doi.org/10.1007/978-1-0716-3195-9_6.
Pełny tekst źródłaHuo, Shuwei, Yuan Zhou i Haiyang Wang. "YFormer: A New Transformer Architecture for Video-Query Based Video Moment Retrieval". W Pattern Recognition and Computer Vision, 638–50. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-18913-5_49.
Pełny tekst źródłaLi, Li, Liansheng Zhuang, Shenghua Gao i Shafei Wang. "HaViT: Hybrid-Attention Based Vision Transformer for Video Classification". W Computer Vision – ACCV 2022, 502–17. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26316-3_30.
Pełny tekst źródłaZhang, Hui, Jiewen Yang, Xingbo Dong, Xingguo Lv, Wei Jia, Zhe Jin i Xuejun Li. "A Video Face Recognition Leveraging Temporal Information Based on Vision Transformer". W Pattern Recognition and Computer Vision, 29–43. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8469-5_3.
Pełny tekst źródłaWu, Jinlin, Lingxiao He, Wu Liu, Yang Yang, Zhen Lei, Tao Mei i Stan Z. Li. "CAViT: Contextual Alignment Vision Transformer for Video Object Re-identification". W Lecture Notes in Computer Science, 549–66. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19781-9_32.
Pełny tekst źródłaStreszczenia konferencji na temat "Video Vision Transformer"
Kobayashi, Takumi, i Masataka Seo. "Efficient Compression Method in Video Reconstruction Using Video Vision Transformer". W 2024 IEEE 13th Global Conference on Consumer Electronics (GCCE), 724–25. IEEE, 2024. https://doi.org/10.1109/gcce62371.2024.10760444.
Pełny tekst źródłaYokota, Haruto, Mert Bozkurtlar, Benjamin Yen, Katsutoshi Itoyama, Kenji Nishida i Kazuhiro Nakadai. "A Video Vision Transformer for Sound Source Localization". W 2024 32nd European Signal Processing Conference (EUSIPCO), 106–10. IEEE, 2024. http://dx.doi.org/10.23919/eusipco63174.2024.10715427.
Pełny tekst źródłaOjaswee, R. Sreemathy, Mousami Turuk, Jayashree Jagdale i Mohammad Anish. "Indian Sign Language Recognition Using Video Vision Transformer". W 2024 3rd International Conference for Advancement in Technology (ICONAT), 1–7. IEEE, 2024. https://doi.org/10.1109/iconat61936.2024.10774678.
Pełny tekst źródłaThuan, Pham Minh, Bui Thu Lam i Pham Duy Trung. "Spatial Vision Transformer: A Novel Approach to Deepfake Video Detection". W 2024 1st International Conference On Cryptography And Information Security (VCRIS), 1–6. IEEE, 2024. https://doi.org/10.1109/vcris63677.2024.10813391.
Pełny tekst źródłaKumari, Supriya, Prince Kumar, Pooja Verma, Rajitha B i Sarsij Tripathi. "Hybrid Vision Transformer and Convolutional Neural Network for Sports Video Classification". W 2024 International Conference on Intelligent Computing and Emerging Communication Technologies (ICEC), 1–5. IEEE, 2024. https://doi.org/10.1109/icec59683.2024.10837289.
Pełny tekst źródłaIsogawa, Junya, Fumihiko Sakaue i Jun Sato. "Simultaneous Estimation of Driving Intentions for Multiple Vehicles Using Video Transformer". W 20th International Conference on Computer Vision Theory and Applications, 471–77. SCITEPRESS - Science and Technology Publications, 2025. https://doi.org/10.5220/0013232100003912.
Pełny tekst źródłaGupta, Anisha, i Vidit Kumar. "A Hybrid U-Net and Vision Transformer approach for Video Anomaly detection". W 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1–6. IEEE, 2024. http://dx.doi.org/10.1109/icccnt61001.2024.10725860.
Pełny tekst źródłaAnsari, Khustar, i Priyanka Srivastava. "Hybrid Attention Vision Transformer-based Deep Learning Model for Video Caption Generation". W 2025 International Conference on Electronics and Renewable Systems (ICEARS), 1238–45. IEEE, 2025. https://doi.org/10.1109/icears64219.2025.10940922.
Pełny tekst źródłaZhou, Xingyu, Leheng Zhang, Xiaorui Zhao, Keze Wang, Leida Li i Shuhang Gu. "Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention". W 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 25399–408. IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.02400.
Pełny tekst źródłaChoi, Joonmyung, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi i Hyunwoo J. Kim. "vid-TLDR: Training Free Token merging for Light-Weight Video Transformer". W 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18771–81. IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.01776.
Pełny tekst źródła