Zaloguj się

Gotowe bibliografie tematyczne / Motion captioning

Gotowa bibliografia na temat „Motion captioning”

Autor: Grafiati

Data publikacji: 20 lipca 2024

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Spis treści

Artykuły w czasopismach
Rozprawy doktorskie
Książki
Części książek
Streszczenia konferencji

Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Motion captioning”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Artykuły w czasopismach na temat "Motion captioning"

1

Iwamura, Kiyohiko, Jun Younes Louhi Kasahara, Alessandro Moro, Atsushi Yamashita i Hajime Asama. "Image Captioning Using Motion-CNN with Object Detection". Sensors 21, nr 4 (10.02.2021): 1270. http://dx.doi.org/10.3390/s21041270.

Pełny tekst źródła

Streszczenie:

Automatic image captioning has many important applications, such as the depiction of visual contents for visually impaired people or the indexing of images on the internet. Recently, deep learning-based image captioning models have been researched extensively. For caption generation, they learn the relation between image features and words included in the captions. However, image features might not be relevant for certain words such as verbs. Therefore, our earlier reported method included the use of motion features along with image features for generating captions including verbs. However, all the motion features were used. Since not all motion features contributed positively to the captioning process, unnecessary motion features decreased the captioning accuracy. As described herein, we use experiments with motion features for thorough analysis of the reasons for the decline in accuracy. We propose a novel, end-to-end trainable method for image caption generation that alleviates the decreased accuracy of caption generation. Our proposed model was evaluated using three datasets: MSR-VTT2016-Image, MSCOCO, and several copyright-free images. Results demonstrate that our proposed method improves caption generation performance.

Style APA, Harvard, Vancouver, ISO itp.

2

Chen, Shaoxiang, i Yu-Gang Jiang. "Motion Guided Spatial Attention for Video Captioning". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 8191–98. http://dx.doi.org/10.1609/aaai.v33i01.33018191.

Pełny tekst źródła

Streszczenie:

Sequence-to-sequence models incorporated with attention mechanism have shown promising improvements on video captioning. While there is rich information both inside and between frames, spatial attention is rarely explored and motion information is usually handled by 3D-CNNs as just another modality for fusion. On the other hand, researches about human perception suggest that apparent motion can attract attention. Motivated by this, we aim to learn spatial attention on video frames under the guidance of motion information for caption generation. We present a novel video captioning framework by utilizing Motion Guided Spatial Attention (MGSA). The proposed MGSA exploits the motion between video frames by learning spatial attention from stacked optical flow images with a custom CNN. To further relate the spatial attention maps of video frames, we designed a Gated Attention Recurrent Unit (GARU) to adaptively incorporate previous attention maps. The whole framework can be trained in an end-to-end manner. We evaluate our approach on two benchmark datasets, MSVD and MSR-VTT. The experiments show that our designed model can generate better video representation and state of the art results are obtained under popular evaluation metrics such as BLEU@4, CIDEr, and METEOR.

Style APA, Harvard, Vancouver, ISO itp.

3

Zhao, Hong, Lan Guo, ZhiWen Chen i HouZe Zheng. "Research on Video Captioning Based on Multifeature Fusion". Computational Intelligence and Neuroscience 2022 (28.04.2022): 1–14. http://dx.doi.org/10.1155/2022/1204909.

Pełny tekst źródła

Streszczenie:

Aiming at the problems that the existing video captioning models pay attention to incomplete information and the generation of expression text is not accurate enough, a video captioning model that integrates image, audio, and motion optical flow is proposed. A variety of large-scale dataset pretraining models are used to extract video frame features, motion information, audio features, and video sequence features. An embedded layer structure based on self-attention mechanism is designed to embed single-mode features and learn single-mode feature parameters. Then, two schemes of joint representation and cooperative representation are used to fuse the multimodal features of the feature vectors output by the embedded layer, so that the model can pay attention to different targets in the video and their interactive relationships, which effectively improves the performance of the video captioning model. The experiment is carried out on large datasets MSR-VTT and LSMDC. Under the metrics BLEU4, METEOR, ROUGEL, and CIDEr, the MSR-VTT benchmark dataset obtained scores of 0.443, 0.327, 0.619, and 0.521, respectively. The result shows that the proposed method can effectively improve the performance of the video captioning model, and the evaluation indexes are improved compared with comparison models.

Style APA, Harvard, Vancouver, ISO itp.

4

Qi, Mengshi, Yunhong Wang, Annan Li i Jiebo Luo. "Sports Video Captioning via Attentive Motion Representation and Group Relationship Modeling". IEEE Transactions on Circuits and Systems for Video Technology 30, nr 8 (sierpień 2020): 2617–33. http://dx.doi.org/10.1109/tcsvt.2019.2921655.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

5

Ahmed, Shakil, A. F. M. Saifuddin Saif, Md Imtiaz Hanif, Md Mostofa Nurannabi Shakil, Md Mostofa Jaman, Md Mazid Ul Haque, Siam Bin Shawkat i in. "Att-BiL-SL: Attention-Based Bi-LSTM and Sequential LSTM for Describing Video in the Textual Formation". Applied Sciences 12, nr 1 (29.12.2021): 317. http://dx.doi.org/10.3390/app12010317.

Pełny tekst źródła

Streszczenie:

With the advancement of the technological field, day by day, people from around the world are having easier access to internet abled devices, and as a result, video data is growing rapidly. The increase of portable devices such as various action cameras, mobile cameras, motion cameras, etc., can also be considered for the faster growth of video data. Data from these multiple sources need more maintenance to process for various usages according to the needs. By considering these enormous amounts of video data, it cannot be navigated fully by the end-users. Throughout recent times, many research works have been done to generate descriptions from the images or visual scene recordings to address the mentioned issue. This description generation, also known as video captioning, is more complex than single image captioning. Various advanced neural networks have been used in various studies to perform video captioning. In this paper, we propose an attention-based Bi-LSTM and sequential LSTM (Att-BiL-SL) encoder-decoder model for describing the video in textual format. The model consists of two-layer attention-based bi-LSTM and one-layer sequential LSTM for video captioning. The model also extracts the universal and native temporal features from the video frames for smooth sentence generation from optical frames. This paper includes the word embedding with a soft attention mechanism and a beam search optimization algorithm to generate qualitative results. It is found that the architecture proposed in this paper performs better than various existing state of the art models.

Style APA, Harvard, Vancouver, ISO itp.

6

Jiang, Wenhui, Yibo Cheng, Linxin Liu, Yuming Fang, Yuxin Peng i Yang Liu. "Comprehensive Visual Grounding for Video Description". Proceedings of the AAAI Conference on Artificial Intelligence 38, nr 3 (24.03.2024): 2552–60. http://dx.doi.org/10.1609/aaai.v38i3.28032.

Pełny tekst źródła

Streszczenie:

The grounding accuracy of existing video captioners is still behind the expectation. The majority of existing methods perform grounded video captioning on sparse entity annotations, whereas the captioning accuracy often suffers from degenerated object appearances on the annotated area such as motion blur and video defocus. Moreover, these methods seldom consider the complex interactions among entities. In this paper, we propose a comprehensive visual grounding network to improve video captioning, by explicitly linking the entities and actions to the visual clues across the video frames. Specifically, the network consists of spatial-temporal entity grounding and action grounding. The proposed entity grounding encourages the attention mechanism to focus on informative spatial areas across video frames, albeit the entity is annotated in only one frame of a video. The action grounding dynamically associates the verbs to related subjects and the corresponding context, which keeps fine-grained spatial and temporal details for action prediction. Both entity grounding and action grounding are formulated as a unified task guided by a soft grounding supervision, which brings architecture simplification and improves training efficiency as well. We conduct extensive experiments on two challenging datasets, and demonstrate significant performance improvements of +2.3 CIDEr on ActivityNet-Entities and +2.2 CIDEr on MSR-VTT compared to state-of-the-arts.

Style APA, Harvard, Vancouver, ISO itp.

7

Kim, Heechan, i Soowon Lee. "A Video Captioning Method Based on Multi-Representation Switching for Sustainable Computing". Sustainability 13, nr 4 (19.02.2021): 2250. http://dx.doi.org/10.3390/su13042250.

Pełny tekst źródła

Streszczenie:

Video captioning is a problem that generates a natural language sentence as a video’s description. A video description includes not only words that express the objects in the video but also words that express the relationships between the objects, or grammatically necessary words. To reflect this characteristic explicitly using a deep learning model, we propose a multi-representation switching method. The proposed method consists of three components: entity extraction, motion extraction, and textual feature extraction. The proposed multi-representation switching method makes it possible for the three components to extract important information for a given video and description pair efficiently. In experiments conducted on the Microsoft Research Video Description dataset, the proposed method recorded scores that exceeded the performance of most existing video captioning methods. This result was achieved without any preprocessing based on computer vision and natural language processing, nor any additional loss function. Consequently, the proposed method has a high generality that can be extended to various domains in terms of sustainable computing.

Style APA, Harvard, Vancouver, ISO itp.

8

Charmatz, Marc. "Magistrate denies motion to dismiss in cases against Harvard and MIT on web content captioning". Disability Compliance for Higher Education 21, nr 10 (20.04.2016): 1–3. http://dx.doi.org/10.1002/dhe.30174.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

9

Chen, Jin, Xiaofeng Ji i Xinxiao Wu. "Adaptive Image-to-Video Scene Graph Generation via Knowledge Reasoning and Adversarial Learning". Proceedings of the AAAI Conference on Artificial Intelligence 36, nr 1 (28.06.2022): 276–84. http://dx.doi.org/10.1609/aaai.v36i1.19903.

Pełny tekst źródła

Streszczenie:

Scene graph in a video conveys a wealth of information about objects and their relationships in the scene, thus benefiting many downstream tasks such as video captioning and visual question answering. Existing methods of scene graph generation require large-scale training videos annotated with objects and relationships in each frame to learn a powerful model. However, such comprehensive annotation is time-consuming and labor-intensive. On the other hand, it is much easier and less cost to annotate images with scene graphs, so we investigate leveraging annotated images to facilitate training a scene graph generation model for unannotated videos, namely image-to-video scene graph generation. This task presents two challenges: 1) infer unseen dynamic relationships in videos from static relationships in images due to the absence of motion information in images; 2) adapt objects and static relationships from images to video frames due to the domain shift between them. To address the first challenge, we exploit external commonsense knowledge to infer the unseen dynamic relationship from the temporal evolution of static relationships. We tackle the second challenge by hierarchical adversarial learning to reduce the data distribution discrepancy between images and video frames. Extensive experiment results on two benchmark video datasets demonstrate the effectiveness of our method.

Style APA, Harvard, Vancouver, ISO itp.

10

Yang, Jiaji, Esyin Chew i Pengcheng Liu. "Service humanoid robotics: a novel interactive system based on bionic-companionship framework". PeerJ Computer Science 7 (13.08.2021): e674. http://dx.doi.org/10.7717/peerj-cs.674.

Pełny tekst źródła

Streszczenie:

At present, industrial robotics focuses more on motion control and vision, whereas humanoid service robotics (HSRs) are increasingly being investigated and researched in the field of speech interaction. The problem and quality of human-robot interaction (HRI) has become a widely debated topic in academia. Especially when HSRs are applied in the hospitality industry, some researchers believe that the current HRI model is not well adapted to the complex social environment. HSRs generally lack the ability to accurately recognize human intentions and understand social scenarios. This study proposes a novel interactive framework suitable for HSRs. The proposed framework is grounded on the novel integration of Trevarthen’s (2001) companionship theory and neural image captioning (NIC) generation algorithm. By integrating image-to-natural interactivity generation and communicating with the environment to better interact with the stakeholder, thereby changing from interaction to a bionic-companionship. Compared to previous research a novel interactive system is developed based on the bionic-companionship framework. The humanoid service robot was integrated with the system to conduct preliminary tests. The results show that the interactive system based on the bionic-companionship framework can help the service humanoid robot to effectively respond to changes in the interactive environment, for example give different responses to the same character in different scenes.

Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat "Motion captioning"

1

Radouane, Karim. "Mécanisme d’attention pour le sous-titrage du mouvement humain : Vers une segmentation sémantique et analyse du mouvement interprétables". Electronic Thesis or Diss., IMT Mines Alès, 2024. http://www.theses.fr/2024EMAL0002.

Pełny tekst źródła

Streszczenie:

Dans l’état de l’art, les tâches de sous-titrage se concentrent souvent sur les images et les vidéos, mais rarement sur les poses humaines. Ces dernières offrent pourtant une représentation concise des activités humaines et, au-delà de la qualité de la génération de texte, la tâche de "légendage" de mouvement peut constituer un intermédiaire pour résoudre d’autres tâches dérivées. Les travaux présentés dans ce manuscrit sont centrés sur l’apprentissage non supervisé qui peut être utilisé pour la segmentation de mouvement et l’identification d’une sémantique associée, ainsi que son interprétabilité. Après une revue de la littérature des méthodes récentes pour l’estimation de poses humaines, un prérequis central pour le légendage basé sur la pose, nous nous intéressons à l’apprentissage de la représentation de pose, avec un accent sur la modélisation basée sur des graphes spatio-temporels. Notre modèle est évalué sur une application réelle de détection de comportement protecteur, pour laquelle nous avons gagné le défi AffectMove. Les contributions majeures concernant le légendage du mouvement sont ensuite détaillées en trois temps. (i) Un mécanisme d’attention récurrent local pour la génération de texte synchronisé avec le mouvement est proposé, où chaque mouvement et sa légende sont décomposés en primitives et sous-légendes correspondantes. Des métriques spécifiques sont proposées pour évaluer la correspondance entre les segments de mouvement et les segments de langage. (ii) Un jeu de données mouvement-langage est ensuite proposé pour permettre une segmentation supervisée. (iii) Enfin, une architecture interprétable avec un processus de raisonnement transparent à travers l’attention spatio-temporelle est proposée. Cette architecture montre des "résultats état-de-l’art" sur les deux jeux de données de référence, KITML et HumanML3D. Des outils efficaces sont proposés pour l’évaluation et l’illustration de l’interprétabilité. Ces contributions ouvrent de nombreuses perspectives de recherche et le manuscrit se termine par une analyse approfondie des applications potentielles : la segmentation d’actions non supervisée, la traduction automatique de la langue des signes ou encore l’impact dans d’autres scénarios
Captioning tasks mainly focus on images or videos, and seldom on human poses. Yet, poses concisely describe human activities. Beyond text generation quality, we consider the motion caption task as an intermediate step to solve other derived tasks. In this holistic approach, our experiments are centered on the unsupervised learning of semantic motion segmentation and interpretability. We first conduct an extensive literature review of recent methods for human pose estimation, as a central prerequisite for pose-based captioning. Then, we take an interest in pose-representation learning, with an emphasis on the use of spatiotemporal graph-based learning, which we apply and evaluate on a real-world application (protective behavior detection). As a result, we win the AffectMove challenge. Next, we delve into the core of our contributions in motion captioning, where: (i) We design local recurrent attention for synchronous text generation with motion. Each motion and its caption are decomposed into primitives and corresponding sub-captions. We also propose specific metrics to evaluate the synchronous mapping between motion and language segments. (ii) We initiate the construction of a motion-language dataset to enable supervised segmentation. (iii) We design an interpretable architecture with a transparent reasoning process through spatiotemporal attention, showing state-of-the-art results on the two reference datasets, KIT-ML and HumanML3D. Effective tools are proposed for interpretability evaluation and illustration. Finally, we conduct a thorough analysis of potential applications: unsupervised action segmentation, sign language translation, and impact in other scenarios

Style APA, Harvard, Vancouver, ISO itp.

Książki na temat "Motion captioning"

1

Sahlin, Ingrid. Tal och undertexter i textade svenska TV-program: Probleminventering och förslag till en analysmodell. Göteborg: Acta Universitatis Gothoburgensis, 2001.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

2

Robson, Gary D. Closed Captioning Handbook. Taylor & Francis Group, 2004.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

3

Robson, Gary D. Closed Captioning Handbook. Taylor & Francis Group, 2004.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

4

Robson, Gary D. Closed Captioning Handbook. Taylor & Francis Group, 2016.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

5

Robson, Gary D. Closed Captioning Handbook. Taylor & Francis Group, 2004.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

6

Robson, Gary D. Closed Captioning Handbook. Taylor & Francis Group, 2004.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

7

Robson, Gary D. The Closed Captioning Handbook. Focal Press, 2004.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

8

Fox, Wendy. Can Integrated Titles Improve the Viewing Experience? Saint Philip Street Press, 2020.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

9

(Editor), Jorge Diaz-Cintas, Pilar Orero (Editor) i Aline Remael (Editor), red. Media for All: Subtitling for the Deaf, Audio Description, and Sign Language (Approaches to Translation Studies 30) (Approaches to Translation Studies). Rodopi, 2007.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Części książek na temat "Motion captioning"

1

Hai-Jew, Shalin. "Image on the Street Is . . ." W Advances in Media, Entertainment, and the Arts, 1–45. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-5225-9821-3.ch001.

Pełny tekst źródła

Streszczenie:

To capture what some of the “Global South”-tagged social messages are in early 2019, an image set of 1000+ images was scraped from Flickr and another 500+ images from Google Images and dozens of fairly recent (past few years) videos were identified on YouTube (with their available closed captioning transcripts captured). These mostly decontextualized digital visual contents (still and motion) were coded with bottom-up coding, based on grounded theory, and some initial insights were created about the multi-dimensional messaging. These contents were generated by conference organizers, alternate and foreign news sites, university lecturers, and the mass public, so the messaging is comprised of both formal and informal messaging, information from news channels, and responses to news channels. This work discusses some of the manual and computation-based coding techniques and some initial findings.

Style APA, Harvard, Vancouver, ISO itp.

Streszczenia konferencji na temat "Motion captioning"

1

Iwamura, Kiyohiko, Jun Younes Louhi Kasahara, Alessandro Moro, Atsushi Yamashita i Hajime Asama. "Potential of Incorporating Motion Estimation for Image Captioning". W 2021 IEEE/SICE International Symposium on System Integration (SII). IEEE, 2021. http://dx.doi.org/10.1109/ieeeconf49454.2021.9382725.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

2

Chen, Shaoxiang, i Yu-Gang Jiang. "Motion Guided Region Message Passing for Video Captioning". W 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021. http://dx.doi.org/10.1109/iccv48922.2021.00157.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

3

Bosch Ruiz, Marc, Christopher M. Gifford, Agata Ciesielski, Scott Almes, Rachel Ellison i Gordon Christie. "Captioning of full motion video from unmanned aerial platforms". W Geospatial Informatics IX, redaktorzy Kannappan Palaniappan, Gunasekaran Seetharaman i Peter J. Doucette. SPIE, 2019. http://dx.doi.org/10.1117/12.2518163.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

4

Hu, Yimin, Guorui Yu, Yuejie Zhang, Rui Feng, Tao Zhang, Xuequan Lu i Shang Gao. "Motion-Aware Video Paragraph Captioning via Exploring Object-Centered Internal Knowledge". W ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023. http://dx.doi.org/10.1109/icassp49357.2023.10096625.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

5

Qi, Mengshi, Yunhong Wang, Annan Li i Jiebo Luo. "Sports Video Captioning by Attentive Motion Representation based Hierarchical Recurrent Neural Networks". W MM '18: ACM Multimedia Conference. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3265845.3265851.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

6

Mori, Yuki, Tsubasa Hirakawa, Takayoshi Yamashita i Hironobu Fujiyoshi. "Image Captioning for Near-Future Events from Vehicle Camera Images and Motion Information". W 2021 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2021. http://dx.doi.org/10.1109/iv48863.2021.9575562.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

7

Kaushik, Prashant, Vikas Saxena i Amarjeet Prajapati. "A Novel Method for Sequence Generation for Video Captioning by Estimating the Objects Motion in Temporal Domain". W 2024 2nd International Conference on Disruptive Technologies (ICDT). IEEE, 2024. http://dx.doi.org/10.1109/icdt61202.2024.10489570.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!