Siga este enlace para ver otros tipos de publicaciones sobre el tema: Video text.

Artículos de revistas sobre el tema "Video text"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Video text".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Huang, Bin, Xin Wang, Hong Chen, Houlun Chen, Yaofei Wu, and Wenwu Zhu. "Identity-Text Video Corpus Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 4 (2025): 3608–16. https://doi.org/10.1609/aaai.v39i4.32375.

Texto completo
Resumen
Video corpus grounding (VCG), which aims to retrieve relevant video moments from a video corpus, has attracted significant attention in the multimedia research community. However, the existing VCG setting primarily focuses on matching textual descriptions with videos and ignores the distinct visual identities in the videos, thus resulting in inaccurate understanding of video content and deteriorated retrieval performances. To address this limitation, we introduce a novel task, Identity-Text Video Corpus Grounding (ITVCG), which simultaneously utilize textual descriptions and visual identities
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Avinash, N. Bhute, and Meshram B.B. "Text Based Approach For Indexing And Retrieval Of Image And Video: A Review." Advances in Vision Computing: An International Journal (AVC) 1, no. 1 (2014): 27–38. https://doi.org/10.5281/zenodo.3554868.

Texto completo
Resumen
Text data present in multimedia contain useful information for automatic annotation, indexing. Extracted information used for recognition of the overlay or scene text from a given video or image. The Extracted text can be used for retrieving the videos and images. In this paper, firstly, we are discussed the different techniques for text extraction from images and videos. Secondly, we are reviewed the techniques for indexing and retrieval of image and videos by using extracted text.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Avinash, N. Bhute, and Meshram B.B. "Text Based Approach For Indexing And Retrieval Of Image And Video: A Review." Advances in Vision Computing: An International Journal (AVC) 1, no. 1 (2014): 27–38. https://doi.org/10.5281/zenodo.3357696.

Texto completo
Resumen
Text data present in multimedia contain useful information for automatic annotation, indexing. Extracted information used for recognition of the overlay or scene text from a given video or image. The Extracted text can be used for retrieving the videos and images. In this paper, firstly, we are discussed the different techniques for text extraction from images and videos. Secondly, we are reviewed the techniques for indexing and retrieval of image and videos by using extracted text.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

V, Divya, Prithica G, and Savija J. "Text Summarization for Education in Vernacular Languages." International Journal for Research in Applied Science and Engineering Technology 11, no. 7 (2023): 175–78. http://dx.doi.org/10.22214/ijraset.2023.54589.

Texto completo
Resumen
Abstract: This project proposes a video summarizing system based on natural language processing (NLP) and Machine Learning to summarize the YouTube video transcripts without losing the key elements. The quantity of videos available on web platforms is steadily expanding. The content is made available globally, primarily for educational purposes. Additionally, educational content is available on YouTube, Facebook, Google, and Instagram. A significant issue of extracting information from videos is that unlike an image, where data can be collected from a single frame, a viewer must watch the enti
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Namrata, Dave, and S. Holia Mehfuza. "News Story Retrieval Based on Textual Query." International Journal of Engineering and Advanced Technology (IJEAT) 9, no. 3 (2021): 2918–22. https://doi.org/10.5281/zenodo.5589205.

Texto completo
Resumen
This paper presents news video retrieval using text query for Gujarati language news videos. Due to the fact that Broadcasted Video in India is lacking in metadata information such as closed captioning, transcriptions etc., retrieval of videos based on text data is trivial task for most of the Indian language video. To retrieve specific story based on text query in regional language is the key idea behind our approach. Broadcast video is segmented to get shots representing small news stories. To represent each shot efficiently, key frame extraction using singular value decomposition and rank o
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Doran, Michael, Adrian Barnett, Joan Leach, William Lott, Katie Page, and Will Grant. "Can video improve grant review quality and lead to more reliable ranking?" Research Ideas and Outcomes 3 (February 1, 2017): e11931. https://doi.org/10.3897/rio.3.e11931.

Texto completo
Resumen
Multimedia video is rapidly becoming mainstream, and many studies indicate that it is a more effective communication medium than text. In this project we AIM to test if videos can be used, in place of text-based grant proposals, to improve communication and increase the reliability of grant ranking. We will test if video improves reviewer comprehension (AIM 1), if external reviewer grant scores are more consistent with video (AIM 2), and if mock Australian Research Council (ARC) panels award more consistent scores when grants are presented as videos (AIM 3). This will be the first study to eva
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Jiang, Ai Wen, and Gao Rong Zeng. "Multi-information Integrated Method for Text Extraction from Videos." Advanced Materials Research 225-226 (April 2011): 827–30. http://dx.doi.org/10.4028/www.scientific.net/amr.225-226.827.

Texto completo
Resumen
Video text provides important semantic information in video content analysis. However, video text with complex background has a poor recognition performance for OCR. Most of the previous approaches to extracting overlay text from videos are based on traditional binarization and give little attention on multi-information integration, especially fusing the background information. This paper presents an effective method to precisely extract characters from videos to enable it for OCR with a good recognition performance. The proposed method combines multi-information together including background
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Ma, Fan, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, and Yi Yang. "Stitching Segments and Sentences towards Generalization in Video-Text Pre-training." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (2024): 4080–88. http://dx.doi.org/10.1609/aaai.v38i5.28202.

Texto completo
Resumen
Video-language pre-training models have recently achieved remarkable results on various multi-modal downstream tasks. However, most of these models rely on contrastive learning or masking modeling to align global features across modalities, neglecting the local associations between video frames and text tokens. This limits the model’s ability to perform fine-grained matching and generalization, especially for tasks that selecting segments in long videos based on query texts. To address this issue, we propose a novel stitching and matching pre-text task for video-language pre-training that enco
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Liu, Yang, Shudong Huang, Deng Xiong, and Jiancheng Lv. "Learning Dynamic Similarity by Bidirectional Hierarchical Sliding Semantic Probe for Efficient Text Video Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 6 (2025): 5667–75. https://doi.org/10.1609/aaai.v39i6.32604.

Texto completo
Resumen
Text-video retrieval is a foundation task in multi-modal research which aims to align texts and videos in the embedding space. The key challenge is to learn the similarity between videos and texts. A conventional approach involves directly aligning video-text pairs using cosine similarity. However, due to the disparity in the information conveyed by videos and texts, i.e., a single video can be described from multiple perspectives, the retrieval accuracy is suboptimal. An alternative approach employs cross-modal interaction to enable videos to dynamically acquire distinct features from various
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Sun, Shangkun, Xiaoyu Liang, Songlin Fan, Wenxu Gao, and Wei Gao. "VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 7 (2025): 7105–13. https://doi.org/10.1609/aaai.v39i7.32763.

Texto completo
Resumen
Text-driven video editing has recently experienced rapid development. Despite this, evaluating edited videos remains a considerable challenge. Current metrics tend to fail to align with human perceptions, and effective quantitative metrics for video editing are still notably absent. To address this, we introduce VE-Bench, a benchmark suite tailored to the assessment of text-driven video editing. This suite includes VE-Bench DB, a video quality assessment (VQA) database for video editing. VE-Bench DB encompasses a diverse set of source videos featuring various motions and subjects, along with m
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Yariv, Guy, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, and Yossi Adi. "Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (2024): 6639–47. http://dx.doi.org/10.1609/aaai.v38i7.28486.

Texto completo
Resumen
We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresponding segment of that video. We utilize an existing text-conditioned video generation model and a pre-trained audio encoder model. The proposed method is based on a lightweight adaptor network, which
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Rachidi, Youssef. "Text Detection in Video for Video Indexing." International Journal of Computer Trends and Technology 68, no. 4 (2020): 96–99. http://dx.doi.org/10.14445/22312803/ijctt-v68i4p117.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Cao, Shuqiang, Bairui Wang, Wei Zhang, and Lin Ma. "Visual Consensus Modeling for Video-Text Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (2022): 167–75. http://dx.doi.org/10.1609/aaai.v36i1.19891.

Texto completo
Resumen
In this paper, we propose a novel method to mine the commonsense knowledge shared between the video and text modalities for video-text retrieval, namely visual consensus modeling. Different from the existing works, which learn the video and text representations and their complicated relationships solely based on the pairwise video-text data, we make the first attempt to model the visual consensus by mining the visual concepts from videos and exploiting their co-occurrence patterns within the video and text modalities with no reliance on any additional concept annotations. Specifically, we buil
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Liu, Yi, Yue Zhang, Haidong Hu, Xiaodong Liu, Lun Zhang, and Ruijun Liu. "An Extended Text Combination Classification Model for Short Video Based on Albert." Journal of Sensors 2021 (October 16, 2021): 1–7. http://dx.doi.org/10.1155/2021/8013337.

Texto completo
Resumen
With the rise and rapid development of short video sharing websites, the number of short videos on the Internet has been growing explosively. The organization and classification of short videos have become the basis for the effective use of short videos, which is also a problem faced by major short video platforms. Aiming at the characteristics of complex short video content categories and rich extended text information, this paper uses methods in the text classification field to solve the short video classification problem. Compared with the traditional way of classifying and understanding sh
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Chiu, Chih-Yi, Po-Chih Lin, Sheng-Yang Li, Tsung-Han Tsai, and Yu-Lung Tsai. "Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment." IEEE Transactions on Circuits and Systems for Video Technology 22, no. 7 (2012): 999–1013. http://dx.doi.org/10.1109/tcsvt.2012.2189478.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Bodyanskaya, Alisa, and Kapitalina Sinegubova. "Music Video as a Poetic Interpretation." Virtual Communication and Social Networks 2023, no. 2 (2023): 47–55. http://dx.doi.org/10.21603/2782-4799-2023-2-2-47-55.

Texto completo
Resumen
This article introduces the phenomenon of videopoetry as a hybrid product of mass media whose popularity is based on intermediality, i.e., the cumulative effect on different perception channels. Videopoetry is a productive form of verbal creativity in the contemporary media culture with its active reception of art. The research featured poems by W. B. Yeats, T. S. Eliot, and W. H. Auden presented as videos and the way they respond to someone else's poetic word. The authors analyzed 15 videos by comparing the original text and the video sequence in line with the method developed by N. V. Barkov
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Letroiwen, Kornelin, Aunurrahman ., and Indri Astuti. "PENGEMBANGAN VIDEO ANIMASI UNTUK MENINGKATKAN KEMAMPUAN READING COMPREHENSION FACTUAL REPORT TEXT." Jurnal Teknologi Pendidikan (JTP) 16, no. 1 (2023): 16. http://dx.doi.org/10.24114/jtp.v16i1.44842.

Texto completo
Resumen
Abstrak: Penelitian ini bertujuan untuk mengembangkan desain video animasi untuk pembelajaran Bahasa Inggris materi factual report text. Metode penelitian ini adalah Research and Development dengan model desain pengembangan ASSURE. Sebanyak 42 siswa kelas XI SMKN 1 Ngabang terlibat dalam penelitian ini. Adapun data yang diperoleh dianalisis secara kualitatif dan kuantitatif. Profil video animasi menampilkan video animasi dengan karakter animasi 2D dan terdiri dari cover, profil pengembang, salam (greeting), kompetensi dasar, tujuan pembelajaran, definisi, fungsi sosial, struktur teks, unsur ke
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Ghorpade, Jayshree, Raviraj Palvankar, Ajinkya Patankar, and Snehal Rathi. "Extracting Text from Video." Signal & Image Processing : An International Journal 2, no. 2 (2011): 103–12. http://dx.doi.org/10.5121/sipij.2011.2209.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Wadaskar, Ghanshyam, Sanghdip Udrake, Vipin Bopanwar, Shravani Upganlawar, and Prof Minakshi Getkar. "Extract Text from Video." International Journal for Research in Applied Science and Engineering Technology 12, no. 5 (2024): 2881–83. http://dx.doi.org/10.22214/ijraset.2024.62287.

Texto completo
Resumen
Abstract: The code import the YoutubeTranscriptionApi from youtube_transcription_api libray, The YouTube video ID is defined. The transcription data for the given video ID is fetched using get_transcription method. The transcription text is extracted from the data and stroed in the transcription variable. The transcriptipn is split into lines and thenjoined back into a single string. Finally the processed transcript is writen into a text file name “Love.text” with UTF-8 encoding. The commented-out code block is an alternative way to write the transcript into a text file using the open function
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Vishwashanthi, M. "Text-To-Video Generator." International Scientific Journal of Engineering and Management 04, no. 05 (2025): 1–9. https://doi.org/10.55041/isjem03655.

Texto completo
Resumen
Abstract: The integration of artificial intelligence in multimedia content creation has paved the way for innovative applications like text-to-video generation. This research presents an advanced Text-to-Video Generator capable of converting textual inputs into coherent video narratives. The system is further enhanced with multilingual support for Indian languages and the inclusion of subtitles, broadening its accessibility and user engagement. By leveraging natural language processing and machine learning techniques, the application ensures accurate interpretation and representation of divers
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Luo, Dezhao, Shaogang Gong, Jiabo Huang, Hailin Jin, and Yang Liu. "Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 6 (2025): 5847–55. https://doi.org/10.1609/aaai.v39i6.32624.

Texto completo
Resumen
Video moment retrieval (VMR) aims to locate the most likely video moment(s) corresponding to a text query in untrimmed videos. Training of existing methods is limited by the lack of diverse and generalisable VMR datasets, hindering their ability to generalise moment-text associations to queries containing novel semantic concepts (unseen both visually and textually in a training source domain). For model generalisation to novel semantics, existing methods rely heavily on assuming to have access to both video and text sentence pairs from a target domain in addition to the source domain pair-wise
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Godha, Ashima, and Puja Trivedi. "CNN Filter based Text Region Segmentation from Lecture Video and Extraction using NeuroOCR." SMART MOVES JOURNAL IJOSCIENCE 5, no. 7 (2019): 7. http://dx.doi.org/10.24113/ijoscience.v5i7.218.

Texto completo
Resumen
Lecture videos are rich with textual information and to be able to understand the text is quite useful for larger video understanding/analysis applications. Though text recognition from images have been an active research area in computer vision, text in lecture videos has mostly been overlooked. In this paper, text extraction from lecture videos are focused. For text extraction from different types of lecture videos such as slides, whiteboard lecture videos, paper lecture videos, etc. The text extraction, the text regions are segmented in video frames and extracted using recurrent neural netw
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Tahwiana, Zein, Regina Regina, Eka Fajar Rahmani, Yohanes Gatot Sutapa Yuliana, and Wardah Wardah. "The ENHANCING NARRATIVE WRITING SKILLS THROUGH ANIMATION VIDEOS IN THE EFL CLASSROOM." Getsempena English Education Journal 12, no. 1 (2025): 1–13. https://doi.org/10.46244/geej.v12i1.2902.

Texto completo
Resumen
This study examined the use of animation videos to teach narrative text writing to SMP Negeri 21 Pontianak eighth-grade students. The study used the 8B class of SMP Negeri 21 Pontianak as the research sample, consisting of 35 students taken from cluster random sampling from a population of 209 students. This pre-experimental study also used a group pre-test and post-test design, consisting of three procedures: pre-test, treatment, and post-test. This study was conducted in two treatments for 120 minutes per meeting by using animation videos to teach narrative text. Two methods were used in the
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Nazmun, Nessa Moon, Salehin Imrus, Parvin Masuma, et al. "Natural language processing based advanced method of unnecessary video detection." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 6 (2021): 5411–19. https://doi.org/10.11591/ijece.v11i6.pp5411-5419.

Texto completo
Resumen
In this study we have described the process of identifying unnecessary video using an advanced combined method of natural language processing and machine learning. The system also includes a framework that contains analytics databases and which helps to find statistical accuracy and can detect, accept or reject unnecessary and unethical video content. In our video detection system, we extract text data from video content in two steps, first from video to MPEG-1 audio layer 3 (MP3) and then from MP3 to WAV format. We have used the text part of natural language processing to analyze and prepare
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Alabsi, Thuraya. "Effects of Adding Subtitles to Video via Apps on Developing EFL Students’ Listening Comprehension." Theory and Practice in Language Studies 10, no. 10 (2020): 1191. http://dx.doi.org/10.17507/tpls.1010.02.

Texto completo
Resumen
It is unclear if using videos and education apps in learning adds additional value to students’ listening comprehension. This study assesses the impact of adding text to videos on English as a Foreign Language (EFL) learners’ listening comprehension. The participants were 76 prep college EFL students from Taibah University, divided into two groups. The semi-experimental measure was employed to compare the experimental group and the control group. The experimental group watched an English learning video and then wrote text subtitles relating to the video using apps, and later took a listening t
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Wu, Peng, Wanshun Su, Xiangteng He, Peng Wang, and Yanning Zhang. "VarCMP: Adapting Cross-Modal Pre-Training Models for Video Anomaly Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 8 (2025): 8423–31. https://doi.org/10.1609/aaai.v39i8.32909.

Texto completo
Resumen
Video anomaly retrieval (VAR) aims to retrieve pertinent abnormal or normal videos from collections of untrimmed and long videos through cross-modal requires such as textual descriptions and synchronized audios. Cross-modal pre-training (CMP) models, by pre-training on large-scale cross-modal pairs, e.g., image and text, can learn the rich associations between different modalities, and this cross-modal association capability gives CMP an advantage in conventional retrieval tasks. Inspired by this, how to utilize the robust cross-modal association capabilities of CMP in VAR to search crucial vi
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Bi, Xiuli, Jian Lu, Bo Liu, et al. "CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 2 (2025): 1871–79. https://doi.org/10.1609/aaai.v39i2.32182.

Texto completo
Resumen
Benefiting from large-scale pre-training of text-video pairs, current text-to-video (T2V) diffusion models can generate high-quality videos from the text description. Besides, given some reference images or videos, the parameter-efficient fine-tuning method, i.e. LoRA, can generate high-quality customized concepts, e.g., the specific subject or the motions from a reference video. However, combining the trained multiple concepts from different references into a single network shows obvious artifacts. To this end, we propose CustomTTT, where we can joint custom the appearance and the motion of t
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Gawade, Shruti. "A Deep Learning Approach to Text-to-Video Generation." International Journal for Research in Applied Science and Engineering Technology 12, no. 6 (2024): 2489–93. http://dx.doi.org/10.22214/ijraset.2024.63513.

Texto completo
Resumen
Abstract: In the ever-evolving landscape of multimedia content creation, there is a growing demand for automated tools that can seamlessly transform textual descriptions into engaging and realistic videos. This research paper introduces a state-of-the-art Text to Video Generation Model, a groundbreaking approach designed to bridge the gap between textual input and visually compelling video output. Leveraging advanced deep learning techniques, the proposed model not only captures the semantic nuances of the input text but also generates dynamic and contextually relevant video sequences. The mod
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

张, 宇. "Video Retrieval Model Based on Video Text Alignment." Journal of Image and Signal Processing 14, no. 03 (2025): 349–61. https://doi.org/10.12677/jisp.2025.143032.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

P, Ilampiray, Naveen Raju D, Thilagavathy A, et al. "Video Transcript Summarizer." E3S Web of Conferences 399 (2023): 04015. http://dx.doi.org/10.1051/e3sconf/202339904015.

Texto completo
Resumen
In today’s world, a large number of videos are uploaded in everyday, which contains information about something. The major challenge is to find the right video and understand the correct content, because there are lot of videos available some videos will contain useless content and even though the perfect content available that content should be required to us. If we not found right one it wastes your full effort and full time to extract the correct usefull information. We propose an innovation idea which uses NLP processing for text extraction and BERT Summarization for Text Summarization. Th
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Choudhary, Waffa. "Text Extraction from Videos Using the Combination of Edge-Based and Stroke Filter Techniques." Advanced Materials Research 403-408 (November 2011): 1068–74. http://dx.doi.org/10.4028/www.scientific.net/amr.403-408.1068.

Texto completo
Resumen
A novel method by combining the edge-based and stroke filter based text extraction in the videos is presented. Several researchers have used edge-based and filter based text extraction in the video frames. However, these individual techniques are having their own advantages and disadvantages to extract text in the video frames. Combination of these two techniques fetches good result as compared to individual techniques. In this paper, the canny edge-based and stroke filter for text extraction in the video frames are amalgamated. The effectiveness of the proposed method is evaluated over the in
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Ilaslan, Muhammet Furkan, Ali Köksal, Kevin Qinghong Lin, Burak Satar, Mike Zheng Shou, and Qianli Xu. "VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 4 (2025): 3886–94. https://doi.org/10.1609/aaai.v39i4.32406.

Texto completo
Resumen
Large Language Model (LLM)-based agents have shown promise in procedural tasks, but the potential of multimodal instructions augmented by texts and videos to assist users remains under-explored. To address this gap, we propose the Visually Grounded Text-Video Prompting (VG-TVP) method which is a novel LLM-empowered Multimodal Procedural Planning (MPP) framework. It generates cohesive text and video procedural plans given a specified high-level objective. The main challenges are achieving textual and visual informativeness, temporal coherence, and accuracy in procedural plans. VG-TVP leverages
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Wu, Yihong, Mingli Lin, and Wenlong Yao. "The Influence of Titles on YouTube Trending Videos." Communications in Humanities Research 29, no. 1 (2024): 285–94. http://dx.doi.org/10.54254/2753-7064/29/20230835.

Texto completo
Resumen
The global video platform market has been growing in a remarkable way in recent years. As a part of a video, title can compel people to view. However, few scholars have studied the relationship between video trendiness and title at present. This work studies the influence of sentiment polarity of videos using Valence Aware Dictionary Sentiment Reasoner (VADER) and investigated the feasibility of the application of video titles text on YouTube trending videos research using Doc2Vec. It is found that the text in YouTube trend video titles possesses predictive value for video trendiness, but it r
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Rushikesh, Chandrakant Konapure, and L.M.R.J. Lobo Dr. "Text Data Analysis for Advertisement Recommendation System Using Multi-label Classification of Machine Learning." Journal of Data Mining and Management 5, no. 1 (2020): 1–6. https://doi.org/10.5281/zenodo.3600112.

Texto completo
Resumen
<em>Everyone today can access the streaming content on their mobile phones, laptops very easily and video has been a very important and popular content on the internet. Nowadays, people are making their content and uploading it on the streaming platforms so the size of the video dataset became massive compared to text, audio and image datasets. So, providing advertisements on the video related to the topic of video will help to boost business. In this proposed system the title and description of video will be taken as input to classify the video using a natural language processing text classif
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Frobenius, Maximiliane. "Pointing gestures in video blogs." Text & Talk 33, no. 1 (2013): 1–23. http://dx.doi.org/10.1515/text-2013-0001.

Texto completo
Resumen
AbstractVideo blogs are a form of CMC (computer-mediated communication) that feature speakers who talk into a camera, and thereby produce a viewer-directed performance. Pointing gestures are part of the resources that the medium affords to design vlogs for the absent recipients. Based on a corpus of 40 vlogs, this research categorizes different kinds of common pointing actions in vlogs. Close analysis reveals the role multimodal factors such as gaze and body posture play along with deictic gestures and verbal reference in the production of a viewer-directed monologue. Those instances where vlo
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Puspita, Widya, Teti Sobari, and Wikanengsih Wikanengsih. "Improving Students Writing Skills Explanation Text using Animated Video." JLER (Journal of Language Education Research) 6, no. 1 (2023): 35–60. http://dx.doi.org/10.22460/jler.v6i1.10198.

Texto completo
Resumen
This study focuses on the influence of an animated video on students' ability to write explanation text. This research uses descriptive qualitative research method. The purpose of this study is to find out whether the animated video used can help students improve their explanation text writing skills and see the differences in students' abilities before and after using animated videos in Indonesian language learning. The subjects in this study came from 20 students of class VII A at MTs Pasundan Cimahi, and the objects in this study were obtained from the results of the pre-test and post-test
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

S, Ramacharan, Akshara Reddy P., Rukmini Reddy R, and Ch.Chathurya. "Script Abstract from Video Clip." Journal of Advancement in Software Engineering and Testing 5, no. 3 (2022): 1–4. https://doi.org/10.5281/zenodo.7321898.

Texto completo
Resumen
In a world where technology is developing at a tremendously fast pace, the educational field has witnessed various new technologies that help in better learning, teaching and understanding. Video tutorials are playing a major role in helping students and learners understand new concepts at a much faster rate and at their own comfort level, but watching long tutorial or lecture videos can be time consuming and tiring the solution for this can be found through a video to text summarization application. With the help of advance NLP and machine learning we can summarize a video tutorial, this summ
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Sanjeeva, Polepaka, Vanipenta Balasri Nitin Reddy, Jagirdar Indraj Goud, Aavula Guru Prasad, and Ashish Pathani. "TEXT2AV – Automated Text to Audio and Video Conversion." E3S Web of Conferences 430 (2023): 01027. http://dx.doi.org/10.1051/e3sconf/202343001027.

Texto completo
Resumen
The paper aims to develop a machine learning-based system that can automatically convert text to audio and text to video as per the user’s request. Suppose Reading a large text is difficult for anyone, but this TTS model makes it easy by converting text into audio by producing the audio output by an avatar with lip sync to make it look more attractive and human-like interaction in many languages. The TTS model is built based on Waveform Recurrent Neural Networks (WaveRNN). It is a type of auto-regressive model that predicts future data based on the present. The system identifies the keywords i
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Creamer, MeLisa, Heather R. Bowles, Belinda von Hofe, Kelley Pettee Gabriel, Harold W. Kohl, and Adrian Bauman. "Utility of Computer-Assisted Approaches for Population Surveillance of Physical Activity." Journal of Physical Activity and Health 11, no. 6 (2014): 1111–19. http://dx.doi.org/10.1123/jpah.2012-0266.

Texto completo
Resumen
Background:Computer-assisted techniques may be a useful way to enhance physical activity surveillance and increase accuracy of reported behaviors.Purpose:Evaluate the reliability and validity of a physical activity (PA) self-report instrument administered by telephone and internet.Methods:The telephone-administered Active Australia Survey was adapted into 2 forms for internet self-administration: survey questions only (internet-text) and with videos demonstrating intensity (internet-video). Data were collected from 158 adults (20–69 years, 61% female) assigned to telephone (telephone-interview
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Du, Wanru, Xiaochuan Jing, Quan Zhu, Xiaoyin Wang, and Xuan Liu. "A cross-modal conditional mechanism based on attention for text-video retrieval." Mathematical Biosciences and Engineering 20, no. 11 (2023): 20073–92. http://dx.doi.org/10.3934/mbe.2023889.

Texto completo
Resumen
&lt;abstract&gt;&lt;p&gt;Current research in cross-modal retrieval has primarily focused on aligning the global features of videos and sentences. However, video conveys a much more comprehensive range of information than text. Thus, text-video matching should focus on the similarities between frames containing critical information and text semantics. This paper proposes a cross-modal conditional feature aggregation model based on the attention mechanism. It includes two innovative modules: (1) A cross-modal attentional feature aggregation module, which uses the semantic text features as condit
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Hua, Hang, Yunlong Tang, Chenliang Xu, and Jiebo Luo. "V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 4 (2025): 3599–607. https://doi.org/10.1609/aaai.v39i4.32374.

Texto completo
Resumen
Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective training of advanced large vision-language models (VLMs). Additionally, most existing datasets are created for video-to-video summarization, overlooking the contemporary need for multimodal video content summarization. Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks ba
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Adams, Aubrie, and Weimin Toh. "Student Emotion in Mediated Learning: Comparing a Text, Video, and Video Game." Electronic Journal of e-Learning 19, no. 6 (2021): pp575–587. http://dx.doi.org/10.34190/ejel.19.6.2546.

Texto completo
Resumen
Although serious games are generally praised by scholars for their potential to enhance teaching and e-learning practices, more empirical evidence is needed to support these accolades. Existing research in this area tends to show that gamified teaching experiences do contribute to significant effects to improve student cognitive, motivational, and behavioural learning outcomes, but these effects are usually small. In addition, less research examines how different types of mediated learning tools compare to one another in influencing student outcomes associated with learning and motivation. As
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Chen, Yizhen, Jie Wang, Lijian Lin, Zhongang Qi, Jin Ma, and Ying Shan. "Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (2023): 396–404. http://dx.doi.org/10.1609/aaai.v37i1.25113.

Texto completo
Resumen
Vision-language alignment learning for video-text retrieval arouses a lot of attention in recent years. Most of the existing methods either transfer the knowledge of image-text pretraining model to video-text retrieval task without fully exploring the multi-modal information of videos, or simply fuse multi-modal features in a brute force manner without explicit guidance. In this paper, we integrate multi-modal information in an explicit manner by tagging, and use the tags as the anchors for better video-text alignment. Various pretrained experts are utilized for extracting the information of m
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Huang, Hong-Bo, Yao-Lin Zheng, and Zhi-Ying Hu. "Video Abnormal Action Recognition Based on Multimodal Heterogeneous Transfer Learning." Advances in Multimedia 2024 (January 19, 2024): 1–12. http://dx.doi.org/10.1155/2024/4187991.

Texto completo
Resumen
Human abnormal action recognition is crucial for video understanding and intelligent surveillance. However, the scarcity of labeled data for abnormal human actions often hinders the development of high-performance models. Inspired by the multimodal approach, this paper proposes a novel approach that leverages text descriptions associated with abnormal human action videos. Our method exploits the correlation between the text domain and the video domain in the semantic feature space and introduces a multimodal heterogeneous transfer learning framework from the text domain to the video domain. Th
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Mochurad, Lesia. "A NEW APPROACH FOR TEXT RECOGNITION ON A VIDEO CARD." Computer systems and information technologies, no. 3 (September 28, 2022): 22–30. http://dx.doi.org/10.31891/csit-2022-3-3.

Texto completo
Resumen
An important task is to develop a computer system that can automatically read text content from images or videos with a complex background. Due to a large number of calculations, it is quite difficult to apply them in real-time. Therefore, the use of parallel and distributed computing in the development of real-time or near real-time systems is relevant. The latter is especially relevant in such areas as automation of video recording of traffic violations, text recognition, machine vision, fingerprint recognition, speech, and more. The paper proposes a new approach to text recognition on a vid
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Lokkondra, Chaitra Yuvaraj, Dinesh Ramegowda, Gopalakrishna Madigondanahalli Thimmaiah, Ajay Prakash Bassappa Vijaya, and Manjula Hebbaka Shivananjappa. "ETDR: An Exploratory View of Text Detection and Recognition in Images and Videos." Revue d'Intelligence Artificielle 35, no. 5 (2021): 383–93. http://dx.doi.org/10.18280/ria.350504.

Texto completo
Resumen
Images and videos with text content are a direct source of information. Today, there is a high need for image and video data that can be intelligently analyzed. A growing number of researchers are focusing on text identification, making it a hot issue in machine vision research. Since this opens the way, several real-time-based applications such as text detection, localization, and tracking have become more prevalent in text analysis systems. To find out more about how text information may be extracted, have a look at our survey. This study presents a trustworthy dataset for text identificatio
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Chen, Yupeng, Penglin Chen, Xiaoyu Zhang, Yixian Huang, and Qian Xie. "EditBoard: Towards a Comprehensive Evaluation Benchmark for Text-Based Video Editing Models." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 15 (2025): 15975–83. https://doi.org/10.1609/aaai.v39i15.33754.

Texto completo
Resumen
The rapid development of diffusion models has significantly advanced AI-generated content (AIGC), particularly in Text-to-Image (T2I) and Text-to-Video (T2V) generation. Text-based video editing, leveraging these generative capabilities, has emerged as a promising field, enabling precise modifications to videos based on text prompts. Despite the proliferation of innovative video editing models, there is a conspicuous lack of comprehensive evaluation benchmarks that holistically assess these models’ performance across various dimensions. Existing evaluations are limited and inconsistent, typica
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Aljorani, Reem, and Boshra Zopon. "Encapsulation Video Classification and Retrieval Based on Arabic Text." Diyala Journal For Pure Science 17, no. 4 (2021): 20–36. http://dx.doi.org/10.24237/djps.17.04.558b.

Texto completo
Resumen
Since Arabic video classification is not a popular field and there isn’t a lot of researches in this area especially in the educational field. A system was proposed to solve this problem and to make the educational Arabic videos more available to the students. A survey was fulfilled to study several papers in order to design and implement a system that classifies videos operative in the Arabic language by extracting its audio features using azure cognitive services which produce text transcripts. Several preprocessing operations are then applied to process the text transcript. A stochastic gra
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Krishnamoorthy, Niveda, Girish Malkarnenkar, Raymond Mooney, Kate Saenko, and Sergio Guadarrama. "Generating Natural-Language Video Descriptions Using Text-Mined Knowledge." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (2013): 541–47. http://dx.doi.org/10.1609/aaai.v27i1.8679.

Texto completo
Resumen
We present a holistic data-driven technique that generates natural-language descriptions for videos. We combine the output of state-of-the-art object and activity detectors with "real-world' knowledge to select the most probable subject-verb-object triplet for describing a video. We show that this knowledge, automatically mined from web-scale text corpora, enhances the triplet selection algorithm by providing it contextual information and leads to a four-fold increase in activity identification. Unlike previous methods, our approach can annotate arbitrary videos without requiring the expensive
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

CHEN, DATONG, JEAN-MARC ODOBEZ, and JEAN-PHILIPPE THIRAN. "MONTE CARLO VIDEO TEXT SEGMENTATION." International Journal of Pattern Recognition and Artificial Intelligence 19, no. 05 (2005): 647–61. http://dx.doi.org/10.1142/s0218001405004216.

Texto completo
Resumen
This paper presents a probabilistic algorithm for segmenting and recognizing text embedded in video sequences based on adaptive thresholding using a Bayes filtering method. The algorithm approximates the posterior distribution of segmentation thresholds of video text by a set of weighted samples. The set of samples is initialized by applying a classical segmentation algorithm on the first video frame and further refined by random sampling under a temporal Bayesian framework. This framework allows us to evaluate a text image segmentor on the basis of recognition result instead of visual segment
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!