To see the other types of publications on this topic, follow the link: Video text.

Journal articles on the topic 'Video text'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Video text.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

V, Divya, Prithica G, and Savija J. "Text Summarization for Education in Vernacular Languages." International Journal for Research in Applied Science and Engineering Technology 11, no. 7 (July 31, 2023): 175–78. http://dx.doi.org/10.22214/ijraset.2023.54589.

Full text
Abstract:
Abstract: This project proposes a video summarizing system based on natural language processing (NLP) and Machine Learning to summarize the YouTube video transcripts without losing the key elements. The quantity of videos available on web platforms is steadily expanding. The content is made available globally, primarily for educational purposes. Additionally, educational content is available on YouTube, Facebook, Google, and Instagram. A significant issue of extracting information from videos is that unlike an image, where data can be collected from a single frame, a viewer must watch the entire video to grasp the context. This study aims to shorten the length ofthe transcript of the given video. The suggested method involves retrieving transcripts from the video link provided by the user and then summarizing the by using Hugging Face Transformers and Pipelining. The built model accepts video links and the required summary duration as input from the user and generates a summarized transcript as output. According to the results, the final translated was obtained in less time when compared with other proposed techniques. Furthermore, the video’s central concept is accurately present in the final without any deviations.
APA, Harvard, Vancouver, ISO, and other styles
2

Rachidi, Youssef. "Text Detection in Video for Video Indexing." International Journal of Computer Trends and Technology 68, no. 4 (April 25, 2020): 96–99. http://dx.doi.org/10.14445/22312803/ijctt-v68i4p117.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Yariv, Guy, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, and Yossi Adi. "Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 6639–47. http://dx.doi.org/10.1609/aaai.v38i7.28486.

Full text
Abstract:
We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresponding segment of that video. We utilize an existing text-conditioned video generation model and a pre-trained audio encoder model. The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model. As such, it also enables video generation conditioned on text, audio, and, for the first time as far as we can ascertain, on both text and audio. We validate our method extensively on three datasets demonstrating significant semantic diversity of audio-video samples and further propose a novel evaluation metric (AV-Align) to assess the alignment of generated videos with input audio samples. AV-Align is based on the detection and comparison of energy peaks in both modalities. In comparison to recent state-of-the-art approaches, our method generates videos that are better aligned with the input sound, both with respect to content and temporal axis. We also show that videos produced by our method present higher visual quality and are more diverse. Code and samples are available at: https://pages.cs.huji.ac.il/adiyoss-lab/TempoTokens/.
APA, Harvard, Vancouver, ISO, and other styles
4

Chiu, Chih-Yi, Po-Chih Lin, Sheng-Yang Li, Tsung-Han Tsai, and Yu-Lung Tsai. "Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment." IEEE Transactions on Circuits and Systems for Video Technology 22, no. 7 (July 2012): 999–1013. http://dx.doi.org/10.1109/tcsvt.2012.2189478.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Jiang, Ai Wen, and Gao Rong Zeng. "Multi-information Integrated Method for Text Extraction from Videos." Advanced Materials Research 225-226 (April 2011): 827–30. http://dx.doi.org/10.4028/www.scientific.net/amr.225-226.827.

Full text
Abstract:
Video text provides important semantic information in video content analysis. However, video text with complex background has a poor recognition performance for OCR. Most of the previous approaches to extracting overlay text from videos are based on traditional binarization and give little attention on multi-information integration, especially fusing the background information. This paper presents an effective method to precisely extract characters from videos to enable it for OCR with a good recognition performance. The proposed method combines multi-information together including background information, edge information, and character’s spatial information. Experimental results show that it is robust to complex background and various text appearances.
APA, Harvard, Vancouver, ISO, and other styles
6

Ma, Fan, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, and Yi Yang. "Stitching Segments and Sentences towards Generalization in Video-Text Pre-training." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (March 24, 2024): 4080–88. http://dx.doi.org/10.1609/aaai.v38i5.28202.

Full text
Abstract:
Video-language pre-training models have recently achieved remarkable results on various multi-modal downstream tasks. However, most of these models rely on contrastive learning or masking modeling to align global features across modalities, neglecting the local associations between video frames and text tokens. This limits the model’s ability to perform fine-grained matching and generalization, especially for tasks that selecting segments in long videos based on query texts. To address this issue, we propose a novel stitching and matching pre-text task for video-language pre-training that encourages fine-grained interactions between modalities. Our task involves stitching video frames or sentences into longer sequences and predicting the positions of cross-model queries in the stitched sequences. The individual frame and sentence representations are thus aligned via the stitching and matching strategy, encouraging the fine-grained interactions between videos and texts. in the stitched sequences for the cross-modal query. We conduct extensive experiments on various benchmarks covering text-to-video retrieval, video question answering, video captioning, and moment retrieval. Our results demonstrate that the proposed method significantly improves the generalization capacity of the video-text pre-training models.
APA, Harvard, Vancouver, ISO, and other styles
7

Cao, Shuqiang, Bairui Wang, Wei Zhang, and Lin Ma. "Visual Consensus Modeling for Video-Text Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 167–75. http://dx.doi.org/10.1609/aaai.v36i1.19891.

Full text
Abstract:
In this paper, we propose a novel method to mine the commonsense knowledge shared between the video and text modalities for video-text retrieval, namely visual consensus modeling. Different from the existing works, which learn the video and text representations and their complicated relationships solely based on the pairwise video-text data, we make the first attempt to model the visual consensus by mining the visual concepts from videos and exploiting their co-occurrence patterns within the video and text modalities with no reliance on any additional concept annotations. Specifically, we build a shareable and learnable graph as the visual consensus, where the nodes denoting the mined visual concepts and the edges connecting the nodes representing the co-occurrence relationships between the visual concepts. Extensive experimental results on the public benchmark datasets demonstrate that our proposed method, with the ability to effectively model the visual consensus, achieves state-of-the-art performances on the bidirectional video-text retrieval task. Our code is available at https://github.com/sqiangcao99/VCM.
APA, Harvard, Vancouver, ISO, and other styles
8

Bodyanskaya, Alisa, and Kapitalina Sinegubova. "Music Video as a Poetic Interpretation." Virtual Communication and Social Networks 2023, no. 2 (April 25, 2023): 47–55. http://dx.doi.org/10.21603/2782-4799-2023-2-2-47-55.

Full text
Abstract:
This article introduces the phenomenon of videopoetry as a hybrid product of mass media whose popularity is based on intermediality, i.e., the cumulative effect on different perception channels. Videopoetry is a productive form of verbal creativity in the contemporary media culture with its active reception of art. The research featured poems by W. B. Yeats, T. S. Eliot, and W. H. Auden presented as videos and the way they respond to someone else's poetic word. The authors analyzed 15 videos by comparing the original text and the video sequence in line with the method developed by N. V. Barkovskaya and A. A. Zhitenev. The analysis revealed several options for relaying a poetic work as a music video. Three videos provided a direct illustration of the source text, suggesting a complete or partial visual duplication of the original poetic imagery. Five videos offered an indirect illustration of the source text by using associative images in relation to the central images of the poem. Five videos gave a minimal illustration: the picture did not dominate the text of the poem, but its choice implied a certain interpretation. Two videos featured the video maker as a reciter. The video makers did not try to transform the poetic text but used the video sequence as a way to enter into a dialogue with the original poem or resorted to indirect illustration to generate occasional meanings. Thus, video makers keep the original text unchanged and see the video sequence and musical accompaniment as their responsibility but maintain a dialogue between the original text and its game reinterpretation.
APA, Harvard, Vancouver, ISO, and other styles
9

Ghorpade, Jayshree, Raviraj Palvankar, Ajinkya Patankar, and Snehal Rathi. "Extracting Text from Video." Signal & Image Processing : An International Journal 2, no. 2 (June 30, 2011): 103–12. http://dx.doi.org/10.5121/sipij.2011.2209.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wadaskar, Ghanshyam, Sanghdip Udrake, Vipin Bopanwar, Shravani Upganlawar, and Prof Minakshi Getkar. "Extract Text from Video." International Journal for Research in Applied Science and Engineering Technology 12, no. 5 (May 31, 2024): 2881–83. http://dx.doi.org/10.22214/ijraset.2024.62287.

Full text
Abstract:
Abstract: The code import the YoutubeTranscriptionApi from youtube_transcription_api libray, The YouTube video ID is defined. The transcription data for the given video ID is fetched using get_transcription method. The transcription text is extracted from the data and stroed in the transcription variable. The transcriptipn is split into lines and thenjoined back into a single string. Finally the processed transcript is writen into a text file name “Love.text” with UTF-8 encoding. The commented-out code block is an alternative way to write the transcript into a text file using the open function directly. which you can use if you prefer.
APA, Harvard, Vancouver, ISO, and other styles
11

Liu, Yi, Yue Zhang, Haidong Hu, Xiaodong Liu, Lun Zhang, and Ruijun Liu. "An Extended Text Combination Classification Model for Short Video Based on Albert." Journal of Sensors 2021 (October 16, 2021): 1–7. http://dx.doi.org/10.1155/2021/8013337.

Full text
Abstract:
With the rise and rapid development of short video sharing websites, the number of short videos on the Internet has been growing explosively. The organization and classification of short videos have become the basis for the effective use of short videos, which is also a problem faced by major short video platforms. Aiming at the characteristics of complex short video content categories and rich extended text information, this paper uses methods in the text classification field to solve the short video classification problem. Compared with the traditional way of classifying and understanding short video key frames, this method has the characteristics of lower computational cost, more accurate classification results, and easier application. This paper proposes a text classification model based on the attention mechanism of multitext embedding short video extension. The experiment first uses the training language model Albert to extract sentence-level vectors and then uses the attention mechanism to study the text information in various short video extensions in a short video classification weight factor. And this research applied Google’s unsupervised data augmentation (UDA) method based on unsupervised data, creatively combining it with the Chinese knowledge graph, and realized TF-IDF word replacement. During the training process, we introduced a large amount of unlabeled data, which significantly improved the accuracy of model classification. The final series of related experiments is aimed at comparing with the existing short video title classification methods, classification methods based on video key frames, and hybrid methods, and proving that the method proposed in this article is more accurate and robust on the test set.
APA, Harvard, Vancouver, ISO, and other styles
12

Letroiwen, Kornelin, Aunurrahman ., and Indri Astuti. "PENGEMBANGAN VIDEO ANIMASI UNTUK MENINGKATKAN KEMAMPUAN READING COMPREHENSION FACTUAL REPORT TEXT." Jurnal Teknologi Pendidikan (JTP) 16, no. 1 (April 11, 2023): 16. http://dx.doi.org/10.24114/jtp.v16i1.44842.

Full text
Abstract:
Abstrak: Penelitian ini bertujuan untuk mengembangkan desain video animasi untuk pembelajaran Bahasa Inggris materi factual report text. Metode penelitian ini adalah Research and Development dengan model desain pengembangan ASSURE. Sebanyak 42 siswa kelas XI SMKN 1 Ngabang terlibat dalam penelitian ini. Adapun data yang diperoleh dianalisis secara kualitatif dan kuantitatif. Profil video animasi menampilkan video animasi dengan karakter animasi 2D dan terdiri dari cover, profil pengembang, salam (greeting), kompetensi dasar, tujuan pembelajaran, definisi, fungsi sosial, struktur teks, unsur kebahasaan, materi pembelajaran, contoh materi factual report text. Hasil Uji validasi ahli desain yaitu 3,79, hasil uji validasi ahli materi 3,57 dan hasil uji validasi ahli media yaitu 3,55. Dari hasil uji validasi ahli desain, ahli materi, ahli media menunjukkan semua rata – rata nya diatas 3,0, yang artinya hasil nya valid. Hal ini menandakan video animasi factual report text daapat digunakan pada pembelajaran bahasa inggris. Untuk memperkuat validnya video animasi ini maka dilakukaan uji coba pada siswa SMK yang merupakan pengguna langsung video animasi tersbut. Adapun hasil uji coba one to one adalah 95,31, hasil uji coba kelompok sedang adalah 93,81, hasil ujicoba kelompok besar adalah 94,75, hasil uji coba ketiga kelompok tersebut lebih tinggi dari kriteria respons siswa Rs ≥ 85. Rata-rata hasil pretest yaitu 62,67 dan rata-rata hasil postest yaitu 81,3, hal tersebut menunjukkan ada peningkatan hasil belajar siswa setelah penggunaan video animasi factual report text. Kata kunci: media pembelajaran, video animasi, reading comprehension, factual report text Abstract: This study aims to develop an animated video design for learning English on factual report text material. This research method is Research and Development with the ASSURE development design model. A total of 42 class XI students of SMKN 1 Ngabang were involved in this study. The data obtained were analyzed qualitatively and quantitatively. The animated video profile displays animated videos with 2D animated characters and consists of covers, developer profiles, greetings, basic competencies, learning objectives, definitions, social functions, text structures, linguistic elements, learning materials, animated images. The result of the design expert validation test was 3.79, the material expert validation test result was 3.57 and the media expert validation test result was 3.55. From the results of the validation test by design experts, material experts, media experts all show that the average is above 3.0, which means the results are valid. This indicates that factual report text animated videos can be used in learning English. To strengthen the validity of this animated video, a trial was carried out on SMK students who are direct users of the animated video. The results of the one to one trial were 95.31, the results of the medium group trial were 93.81, the results of the large group trial were 94.75, the results of the three group trials were higher than the media response criteria of Rs ≥ 85. The average pretest result is 62.67 and the average posttest result is 81.3. This shows that there is an increase in student learning outcomes after using factual report text animated videos. Keywords: learning media, animated videos, reading comprehension, factual report text
APA, Harvard, Vancouver, ISO, and other styles
13

P, Ilampiray, Naveen Raju D, Thilagavathy A, Mohamed Tharik M, Madhan Kishore S, Nithin A.S, and Infant Raj I. "Video Transcript Summarizer." E3S Web of Conferences 399 (2023): 04015. http://dx.doi.org/10.1051/e3sconf/202339904015.

Full text
Abstract:
In today’s world, a large number of videos are uploaded in everyday, which contains information about something. The major challenge is to find the right video and understand the correct content, because there are lot of videos available some videos will contain useless content and even though the perfect content available that content should be required to us. If we not found right one it wastes your full effort and full time to extract the correct usefull information. We propose an innovation idea which uses NLP processing for text extraction and BERT Summarization for Text Summarization. This provides a video main content in text description and abstractive summary, enabling users to discriminate between relevant and irrelevant information according to their needs. Furthermore, our experiments show that the joint model can attain good results with informative, concise, and readable multi-line video description and summary in a human evaluation.
APA, Harvard, Vancouver, ISO, and other styles
14

Godha, Ashima, and Puja Trivedi. "CNN Filter based Text Region Segmentation from Lecture Video and Extraction using NeuroOCR." SMART MOVES JOURNAL IJOSCIENCE 5, no. 7 (July 28, 2019): 7. http://dx.doi.org/10.24113/ijoscience.v5i7.218.

Full text
Abstract:
Lecture videos are rich with textual information and to be able to understand the text is quite useful for larger video understanding/analysis applications. Though text recognition from images have been an active research area in computer vision, text in lecture videos has mostly been overlooked. In this paper, text extraction from lecture videos are focused. For text extraction from different types of lecture videos such as slides, whiteboard lecture videos, paper lecture videos, etc. The text extraction, the text regions are segmented in video frames and extracted using recurrent neural network based OCR. And finally, the extracted text is converted into audio for ease of convenience. The designed algorithm is tested on different videos from different lectures. The experimental results show that the proposed methodology is quite efficient over existing work.
APA, Harvard, Vancouver, ISO, and other styles
15

Frobenius, Maximiliane. "Pointing gestures in video blogs." Text & Talk 33, no. 1 (January 25, 2013): 1–23. http://dx.doi.org/10.1515/text-2013-0001.

Full text
Abstract:
AbstractVideo blogs are a form of CMC (computer-mediated communication) that feature speakers who talk into a camera, and thereby produce a viewer-directed performance. Pointing gestures are part of the resources that the medium affords to design vlogs for the absent recipients. Based on a corpus of 40 vlogs, this research categorizes different kinds of common pointing actions in vlogs. Close analysis reveals the role multimodal factors such as gaze and body posture play along with deictic gestures and verbal reference in the production of a viewer-directed monologue. Those instances where vloggers point at referents outside the video frame, e.g., elements of the Web site that represent alternative modes of communication, such as written comments, receive particular attention in the present study, as they require mutual knowledge about the shared virtual context the vlog is situated in.
APA, Harvard, Vancouver, ISO, and other styles
16

Alabsi, Thuraya. "Effects of Adding Subtitles to Video via Apps on Developing EFL Students’ Listening Comprehension." Theory and Practice in Language Studies 10, no. 10 (October 1, 2020): 1191. http://dx.doi.org/10.17507/tpls.1010.02.

Full text
Abstract:
It is unclear if using videos and education apps in learning adds additional value to students’ listening comprehension. This study assesses the impact of adding text to videos on English as a Foreign Language (EFL) learners’ listening comprehension. The participants were 76 prep college EFL students from Taibah University, divided into two groups. The semi-experimental measure was employed to compare the experimental group and the control group. The experimental group watched an English learning video and then wrote text subtitles relating to the video using apps, and later took a listening test to evaluate their ability in acquiring information through the videos. The control group watched videos during live lectures but did not add subtitles on the content they viewed. A paired samples t-test was used to assess the extent of listening comprehension achievement and posttest results were compared. Results revealed statistically significant increases in posttest listening comprehension scores. The result indicated superior performance and a significant positive impact through teaching/learning via video watching and adding text apps.
APA, Harvard, Vancouver, ISO, and other styles
17

Puspita, Widya, Teti Sobari, and Wikanengsih Wikanengsih. "Improving Students Writing Skills Explanation Text using Animated Video." JLER (Journal of Language Education Research) 6, no. 1 (February 26, 2023): 35–60. http://dx.doi.org/10.22460/jler.v6i1.10198.

Full text
Abstract:
This study focuses on the influence of an animated video on students' ability to write explanation text. This research uses descriptive qualitative research method. The purpose of this study is to find out whether the animated video used can help students improve their explanation text writing skills and see the differences in students' abilities before and after using animated videos in Indonesian language learning. The subjects in this study came from 20 students of class VII A at MTs Pasundan Cimahi, and the objects in this study were obtained from the results of the pre-test and post-test of students in writing explanation texts before and after using animated videos. The results of the analysis show that based on the output table data 1, there is no decrease from the pre test value to the post test value. Then between the results of the explanation text writing test for the pre-test and post-test there were 20 positive data (N), meaning that 20 students experienced an increase in the results of the explanation text writing test. Furthermore, in output table 2, it shows that there is a difference between the results of the explanation text writing test for pre-test and post-test, meaning that there is an influence of using animated videos on the learning outcomes of writing explanation texts for class VII A students. Animated videos are one of the learning media that can be used to improve the skills of writing explanation text.Â
APA, Harvard, Vancouver, ISO, and other styles
18

Choudhary, Waffa. "Text Extraction from Videos Using the Combination of Edge-Based and Stroke Filter Techniques." Advanced Materials Research 403-408 (November 2011): 1068–74. http://dx.doi.org/10.4028/www.scientific.net/amr.403-408.1068.

Full text
Abstract:
A novel method by combining the edge-based and stroke filter based text extraction in the videos is presented. Several researchers have used edge-based and filter based text extraction in the video frames. However, these individual techniques are having their own advantages and disadvantages to extract text in the video frames. Combination of these two techniques fetches good result as compared to individual techniques. In this paper, the canny edge-based and stroke filter for text extraction in the video frames are amalgamated. The effectiveness of the proposed method is evaluated over the individual edge-based and stroke filter based techniques and found that the proposed method significantly improves the text extraction rate in the videos in terms of precision (91.99%) and recall (87.18%) respectively
APA, Harvard, Vancouver, ISO, and other styles
19

Adams, Aubrie, and Weimin Toh. "Student Emotion in Mediated Learning: Comparing a Text, Video, and Video Game." Electronic Journal of e-Learning 19, no. 6 (December 16, 2021): pp575–587. http://dx.doi.org/10.34190/ejel.19.6.2546.

Full text
Abstract:
Although serious games are generally praised by scholars for their potential to enhance teaching and e-learning practices, more empirical evidence is needed to support these accolades. Existing research in this area tends to show that gamified teaching experiences do contribute to significant effects to improve student cognitive, motivational, and behavioural learning outcomes, but these effects are usually small. In addition, less research examines how different types of mediated learning tools compare to one another in influencing student outcomes associated with learning and motivation. As such, a question can be asked in this area: how do video games compare to other types of mediated tools, such as videos or texts, in influencing student emotion outcomes? This study used an experimental design (N = 153) to examine the influence of different types of mass media modalities (text, video, and video game) on college students’ emotions in a mediated learning context. Research examining the impact of video games on instruction has begun to grow, but few studies appropriately acknowledge the nuanced differences between media tools in comparison to one another. Using a media-attributes approach as a lens, this study first compared these mediated tools along the attributional dimensions of textuality, channel, interactivity, and control. This study next tested the impact of each media type on thirteen emotion outcomes. Results showed that six emotion outcomes did not indicate differences between groups (fear, guilt, sadness, shyness, serenity, and general negative emotions). However, six of the tested emotion outcomes did indicate differences between groups with students experiencing higher levels of emotional arousal in both the text and video game conditions (in comparison to the video condition) for the emotions of joviality, self-assurance, attentiveness, surprise, hostility, and general positive emotions. Lastly, students also felt less fatigue in the video game condition. Overall, implications for e-learning suggest that when a message’s content is held constant, both video games and texts may be better in inducing emotional intensity and reducing fatigue than videos alone, which could enhance motivation to learn when teaching is mediated by technology.
APA, Harvard, Vancouver, ISO, and other styles
20

Wu, Yihong, Mingli Lin, and Wenlong Yao. "The Influence of Titles on YouTube Trending Videos." Communications in Humanities Research 29, no. 1 (April 19, 2024): 285–94. http://dx.doi.org/10.54254/2753-7064/29/20230835.

Full text
Abstract:
The global video platform market has been growing in a remarkable way in recent years. As a part of a video, title can compel people to view. However, few scholars have studied the relationship between video trendiness and title at present. This work studies the influence of sentiment polarity of videos using Valence Aware Dictionary Sentiment Reasoner (VADER) and investigated the feasibility of the application of video titles text on YouTube trending videos research using Doc2Vec. It is found that the text in YouTube trend video titles possesses predictive value for video trendiness, but it requires advanced techniques such as deep learning for full exploitation. The sentiment polawrity in titles impacts the video views and this impact varies across video categories.
APA, Harvard, Vancouver, ISO, and other styles
21

Sanjeeva, Polepaka, Vanipenta Balasri Nitin Reddy, Jagirdar Indraj Goud, Aavula Guru Prasad, and Ashish Pathani. "TEXT2AV – Automated Text to Audio and Video Conversion." E3S Web of Conferences 430 (2023): 01027. http://dx.doi.org/10.1051/e3sconf/202343001027.

Full text
Abstract:
The paper aims to develop a machine learning-based system that can automatically convert text to audio and text to video as per the user’s request. Suppose Reading a large text is difficult for anyone, but this TTS model makes it easy by converting text into audio by producing the audio output by an avatar with lip sync to make it look more attractive and human-like interaction in many languages. The TTS model is built based on Waveform Recurrent Neural Networks (WaveRNN). It is a type of auto-regressive model that predicts future data based on the present. The system identifies the keywords in the input texts and uses diffusion models to generate high-quality video content. The system uses GAN (Generative Adversarial Network) to generate videos. Frame Interpolation is used to combine different frames into two adjacent frames to generate a slow- motion video. WebVid-20M, Image-Net, and Hugging-Face are the datasets used for Text video and LibriTTS corpus, and Lip Sync are the dataset used for text-to-audio. The System provides a user-friendly and automated platform to the user which takes text as input and produces either a high-quality audio or high-resolution video quickly and efficiently.
APA, Harvard, Vancouver, ISO, and other styles
22

Aljorani, Reem, and Boshra Zopon. "Encapsulation Video Classification and Retrieval Based on Arabic Text." Diyala Journal For Pure Science 17, no. 4 (October 1, 2021): 20–36. http://dx.doi.org/10.24237/djps.17.04.558b.

Full text
Abstract:
Since Arabic video classification is not a popular field and there isn’t a lot of researches in this area especially in the educational field. A system was proposed to solve this problem and to make the educational Arabic videos more available to the students. A survey was fulfilled to study several papers in order to design and implement a system that classifies videos operative in the Arabic language by extracting its audio features using azure cognitive services which produce text transcripts. Several preprocessing operations are then applied to process the text transcript. A stochastic gradient descent SGD algorithm was used to classify transcripts and give a suitable label for each video. In addition, a search technique was applied to enable students to retrieve the videos they need. The results showed that SGD algorithm recorded the highest classification accuracy with 89.3 % when compared to other learning models. In the section below, a survey was introduced consisting of the most relevant and recent papers to this work.
APA, Harvard, Vancouver, ISO, and other styles
23

Mochurad, Lesia. "A NEW APPROACH FOR TEXT RECOGNITION ON A VIDEO CARD." Computer systems and information technologies, no. 3 (September 28, 2022): 22–30. http://dx.doi.org/10.31891/csit-2022-3-3.

Full text
Abstract:
An important task is to develop a computer system that can automatically read text content from images or videos with a complex background. Due to a large number of calculations, it is quite difficult to apply them in real-time. Therefore, the use of parallel and distributed computing in the development of real-time or near real-time systems is relevant. The latter is especially relevant in such areas as automation of video recording of traffic violations, text recognition, machine vision, fingerprint recognition, speech, and more. The paper proposes a new approach to text recognition on a video card. A parallel algorithm for processing a group of images and a video sequence has been developed and tested. Parallelization on the video-core is provided by the OpenCL framework and CUDA technology. Without reducing the generality, the problem of processing images on which there are vehicles, which allowed to obtain text from the license plate. A system was developed that was tested for the processing speed of a group of images and videos while achieving an average processing speed of 207 frames per second. As for the execution time of the parallel algorithm, for 50 images and video in 63 frames, image preprocessing took 0.4 seconds, which is sufficient for real-time or near real-time systems. The maximum acceleration of image processing is obtained up to 8 times, and the video sequence – up to 12. The general tendency to increase the acceleration with increasing dimensionality of the processed image is preserved, which indicates the relevance of parallel calculations in solving the problem.
APA, Harvard, Vancouver, ISO, and other styles
24

Krishnamoorthy, Niveda, Girish Malkarnenkar, Raymond Mooney, Kate Saenko, and Sergio Guadarrama. "Generating Natural-Language Video Descriptions Using Text-Mined Knowledge." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 30, 2013): 541–47. http://dx.doi.org/10.1609/aaai.v27i1.8679.

Full text
Abstract:
We present a holistic data-driven technique that generates natural-language descriptions for videos. We combine the output of state-of-the-art object and activity detectors with "real-world' knowledge to select the most probable subject-verb-object triplet for describing a video. We show that this knowledge, automatically mined from web-scale text corpora, enhances the triplet selection algorithm by providing it contextual information and leads to a four-fold increase in activity identification. Unlike previous methods, our approach can annotate arbitrary videos without requiring the expensive collection and annotation of a similar training video corpus. We evaluate our technique against a baseline that does not use text-mined knowledge and show that humans prefer our descriptions 61% of the time.
APA, Harvard, Vancouver, ISO, and other styles
25

Creamer, MeLisa, Heather R. Bowles, Belinda von Hofe, Kelley Pettee Gabriel, Harold W. Kohl, and Adrian Bauman. "Utility of Computer-Assisted Approaches for Population Surveillance of Physical Activity." Journal of Physical Activity and Health 11, no. 6 (August 2014): 1111–19. http://dx.doi.org/10.1123/jpah.2012-0266.

Full text
Abstract:
Background:Computer-assisted techniques may be a useful way to enhance physical activity surveillance and increase accuracy of reported behaviors.Purpose:Evaluate the reliability and validity of a physical activity (PA) self-report instrument administered by telephone and internet.Methods:The telephone-administered Active Australia Survey was adapted into 2 forms for internet self-administration: survey questions only (internet-text) and with videos demonstrating intensity (internet-video). Data were collected from 158 adults (20–69 years, 61% female) assigned to telephone (telephone-interview) (n = 56), internet-text (n = 51), or internet-video (n = 51). Participants wore an accelerometer and completed a logbook for 7 days. Test-retest reliability was assessed using intraclass correlation coefficients (ICC). Convergent validity was assessed using Spearman correlations.Results:Strong test-retest reliability was observed for PA variables in the internet-text (ICC = 0.69 to 0.88), internet-video (ICC = 0.66 to 0.79), and telephone-interview (ICC = 0.69 to 0.92) groups (P-values < 0.001). For total PA, correlations (ρ) between the survey and Actigraph+logbook were ρ = 0.47 for the internet-text group, ρ = 0.57 for the internet-video group, and ρ = 0.65 for the telephone-interview group. For vigorous-intensity activity, the correlations between the survey and Actigraph+logbook were 0.52 for internet-text, 0.57 for internet-video, and 0.65 for telephone-interview (P < .05).Conclusions:Internet-video of the survey had similar test-retest reliability and convergent validity when compared with the telephone-interview, and should continue to be developed.
APA, Harvard, Vancouver, ISO, and other styles
26

Du, Wanru, Xiaochuan Jing, Quan Zhu, Xiaoyin Wang, and Xuan Liu. "A cross-modal conditional mechanism based on attention for text-video retrieval." Mathematical Biosciences and Engineering 20, no. 11 (2023): 20073–92. http://dx.doi.org/10.3934/mbe.2023889.

Full text
Abstract:
<abstract><p>Current research in cross-modal retrieval has primarily focused on aligning the global features of videos and sentences. However, video conveys a much more comprehensive range of information than text. Thus, text-video matching should focus on the similarities between frames containing critical information and text semantics. This paper proposes a cross-modal conditional feature aggregation model based on the attention mechanism. It includes two innovative modules: (1) A cross-modal attentional feature aggregation module, which uses the semantic text features as conditional projections to extract the most relevant features from the video frames. It aggregates these frame features to form global video features. (2) A global-local similarity calculation module calculates similarities at two granularities (video-sentence and frame-word features) to consider both the topic and detail features in the text-video matching process. Our experiments on the four widely used MSR-VTT, LSMDC, MSVD and DiDeMo datasets demonstrate the effectiveness of our model and its superiority over state-of-the-art methods. The results show that the cross-modal attention aggregation approach can effectively capture the primary semantic information of the video. At the same time, the global-local similarity calculation model can accurately match text and video based on topic and detail features.</p></abstract>
APA, Harvard, Vancouver, ISO, and other styles
27

CHEN, DATONG, JEAN-MARC ODOBEZ, and JEAN-PHILIPPE THIRAN. "MONTE CARLO VIDEO TEXT SEGMENTATION." International Journal of Pattern Recognition and Artificial Intelligence 19, no. 05 (August 2005): 647–61. http://dx.doi.org/10.1142/s0218001405004216.

Full text
Abstract:
This paper presents a probabilistic algorithm for segmenting and recognizing text embedded in video sequences based on adaptive thresholding using a Bayes filtering method. The algorithm approximates the posterior distribution of segmentation thresholds of video text by a set of weighted samples. The set of samples is initialized by applying a classical segmentation algorithm on the first video frame and further refined by random sampling under a temporal Bayesian framework. This framework allows us to evaluate a text image segmentor on the basis of recognition result instead of visual segmentation result, which is directly relevant to our character recognition task. Results on a database of 6944 images demonstrate the validity of the algorithm.
APA, Harvard, Vancouver, ISO, and other styles
28

Welsh, Stephen, and Damian Conway. "Encoding Video Narration as Text." Real-Time Imaging 6, no. 5 (October 2000): 391–405. http://dx.doi.org/10.1006/rtim.1999.0189.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Han, Huasong, Ziqing Li, Fei Fang, Fei Luo, and Chunxia Xiao. "Text to video generation via knowledge distillation." Metaverse 5, no. 1 (March 19, 2024): 2425. http://dx.doi.org/10.54517/m.v5i1.2425.

Full text
Abstract:
<p>Text-to-video generation (T2V) has recently attracted more attention due to the wide application scenarios of video media. However, compared with the substantial advances in text-to-image generation (T2I), the research on T2V remains in its early stage. The difficulty mainly lies in maintaining the text-visual semantic consistency and the video temporal coherence. In this paper, we propose a novel distillation and translation GAN (DTGAN) to address these problems. First, we leverage knowledge distillation to guarantee semantic consistency. We distill text-visual mappings from a well-performing T2I teacher model and transfer it to our DTGAN. This knowledge serves as shared abstract features and high-level constraints for each frame in the generated videos. Second, we propose a novel visual recurrent unit (VRU) to achieve video temporal coherence. The VRU can generate frame sequences as well as process the temporal information across frames. It enables our generator to act as a multi-modal variant of the language model in neural machine translation task, which iteratively predicts the next frame based on the input text and the previously generated frames. We conduct experiments on two synthetic datasets (SBMG and TBMG) and one real-world dataset (MSVD). Qualitative and quantitative comparisons with state-of-the-art methods demonstrate that our DTGAN can generate results with better text-visual semantic consistency and temporal coherence.<strong></strong></p>
APA, Harvard, Vancouver, ISO, and other styles
30

Chen, Yizhen, Jie Wang, Lijian Lin, Zhongang Qi, Jin Ma, and Ying Shan. "Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 396–404. http://dx.doi.org/10.1609/aaai.v37i1.25113.

Full text
Abstract:
Vision-language alignment learning for video-text retrieval arouses a lot of attention in recent years. Most of the existing methods either transfer the knowledge of image-text pretraining model to video-text retrieval task without fully exploring the multi-modal information of videos, or simply fuse multi-modal features in a brute force manner without explicit guidance. In this paper, we integrate multi-modal information in an explicit manner by tagging, and use the tags as the anchors for better video-text alignment. Various pretrained experts are utilized for extracting the information of multiple modalities, including object, person, motion, audio, etc. To take full advantage of these information, we propose the TABLE (TAgging Before aLignmEnt) network, which consists of a visual encoder, a tag encoder, a text encoder, and a tag-guiding cross-modal encoder for jointly encoding multi-frame visual features and multi-modal tags information. Furthermore, to strengthen the interaction between video and text, we build a joint cross-modal encoder with the triplet input of [vision, tag, text] and perform two additional supervised tasks, Video Text Matching (VTM) and Masked Language Modeling (MLM). Extensive experimental results demonstrate that the TABLE model is capable of achieving State-Of-The-Art (SOTA) performance on various video-text retrieval benchmarks, including MSR-VTT, MSVD, LSMDC and DiDeMo.
APA, Harvard, Vancouver, ISO, and other styles
31

Huang, Hong-Bo, Yao-Lin Zheng, and Zhi-Ying Hu. "Video Abnormal Action Recognition Based on Multimodal Heterogeneous Transfer Learning." Advances in Multimedia 2024 (January 19, 2024): 1–12. http://dx.doi.org/10.1155/2024/4187991.

Full text
Abstract:
Human abnormal action recognition is crucial for video understanding and intelligent surveillance. However, the scarcity of labeled data for abnormal human actions often hinders the development of high-performance models. Inspired by the multimodal approach, this paper proposes a novel approach that leverages text descriptions associated with abnormal human action videos. Our method exploits the correlation between the text domain and the video domain in the semantic feature space and introduces a multimodal heterogeneous transfer learning framework from the text domain to the video domain. The text of the videos is used for feature encoding and knowledge extraction, and knowledge transfer and sharing are realized in the feature space, which is used to assist in the training of the abnormal action recognition model. The proposed method reduces the reliance on labeled video data, improves the performance of the abnormal human action recognition algorithm, and outperforms the popular video-based models, particularly in scenarios with sparse data. Moreover, our framework contributes to the advancement of automatic video analysis and abnormal action recognition, providing insights for the application of multimodal methods in a broader context.
APA, Harvard, Vancouver, ISO, and other styles
32

Jiang, Lu, Shoou-I. Yu, Deyu Meng, Teruko Mitamura, and Alexander G. Hauptmann. "Text-to-video: a semantic search engine for internet videos." International Journal of Multimedia Information Retrieval 5, no. 1 (December 24, 2015): 3–18. http://dx.doi.org/10.1007/s13735-015-0093-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Lokkondra, Chaitra Yuvaraj, Dinesh Ramegowda, Gopalakrishna Madigondanahalli Thimmaiah, Ajay Prakash Bassappa Vijaya, and Manjula Hebbaka Shivananjappa. "ETDR: An Exploratory View of Text Detection and Recognition in Images and Videos." Revue d'Intelligence Artificielle 35, no. 5 (October 31, 2021): 383–93. http://dx.doi.org/10.18280/ria.350504.

Full text
Abstract:
Images and videos with text content are a direct source of information. Today, there is a high need for image and video data that can be intelligently analyzed. A growing number of researchers are focusing on text identification, making it a hot issue in machine vision research. Since this opens the way, several real-time-based applications such as text detection, localization, and tracking have become more prevalent in text analysis systems. To find out more about how text information may be extracted, have a look at our survey. This study presents a trustworthy dataset for text identification in images and videos at first. The second part of the article details the numerous text formats, both in images and video. Third, the process flow for extracting information from the text and the existing machine learning and deep learning techniques used to train the model was described. Fourth, explain assessment measures that are used to validate the model. Finally, it integrates the uses and difficulties of text extraction across a wide range of fields. Difficulties focus on the most frequent challenges faced in the actual world, such as capturing techniques, lightning, and environmental conditions. Images and videos have evolved into valuable sources of data. The text inside the images and video provides a massive quantity of facts and statistics. However, such data is not easy to access. This exploratory view provides easier and more accurate mathematical modeling and evaluation techniques to retrieve the text in image and video into an accessible form.
APA, Harvard, Vancouver, ISO, and other styles
34

Reddy, Mr G. Sekhar, A. Sahithi, P. Harsha Vardhan, and P. Ushasri. "Conversion of Sign Language Video to Text and Speech." International Journal for Research in Applied Science and Engineering Technology 10, no. 5 (May 31, 2022): 159–64. http://dx.doi.org/10.22214/ijraset.2022.42078.

Full text
Abstract:
Abstract: Sign Language recognition (SLR) is a significant and promising technique to facilitate communication for hearingimpaired people. Here, we are dedicated to finding an efficient solution to the gesture recognition problem. This work develops a sign language (SL) recognition framework with deep neural networks, which directly transcribes videos of SL sign to word. We propose a novel approach, by using Video sequences that contain both the temporal as well as the spatial features. So, we have used two different models to train both the temporal as well as spatial features. To train the model on the spatial features of the video sequences we use the (Convolutional Neural Networks) CNN model. CNN was trained on the frames obtained from the video sequences of train data. We have used RNN(recurrent neural network) to train the model on the temporal features. A trained CNN model was used to make predictions for individual frames to obtain a sequence of predictions or pool layer outputs for each video. Now this sequence of prediction or pool layer outputs was given to RNN to train on the temporal features. Thus, we perform sign language translation where input video will be given, and by using CNN and RNN, the sign shown in the video is recognized and converted to text and speech. Keywords: CNN (Convolutional Neural Network), RNN(Recurrent Neural Network), SLR(Sign Language Recognition), SL(Sign Language).
APA, Harvard, Vancouver, ISO, and other styles
35

Arlin, Febyana, Herman Budiyono, and Sri Wachyunni. "Evaluatingstudents Perception on Learning to Write Video-Based Explanatory Text." Asian Journal of Education and Social Studies 50, no. 1 (January 19, 2024): 229–38. http://dx.doi.org/10.9734/ajess/2024/v50i11253.

Full text
Abstract:
Perception is a perspective that influences a person's decision-making regarding something being thought about and done. In the learning process, perceptions can positively or negatively impact students' mastery of the material and motivation. This research aims to determine students' perceptions of learning explanatory texts using videos. The approach used in this research is descriptive qualitative. The data collection technique in this research is carried out using questionnaires. This research indicates that students' perceptions of learning explanatory texts with the help of animated videos are positive. All aspects of animated learning videos measured in the context of descriptive text learning received positive responses. Students' positive reaction to the animated video in learning explanatory text shows that the animated video can stimulate and has the potential for students to be more productive in learning and mastering explanatory text material.
APA, Harvard, Vancouver, ISO, and other styles
36

HU, JIANMING, JIE XI, and LIDE WU. "AUTOMATIC DETECTION AND VERIFICATION OF TEXT REGIONS IN NEWS VIDEO FRAMES." International Journal of Pattern Recognition and Artificial Intelligence 16, no. 02 (March 2002): 257–71. http://dx.doi.org/10.1142/s0218001402001629.

Full text
Abstract:
Textual information in a video is very useful for video indexing and retrieving. Detecting text blocks in video frames is the first important procedure for extracting the textual information. Automatic text location is a very difficult problem due to the large variety of character styles and the complex backgrounds. In this paper, we describe the various steps of the proposed text detection algorithm. First, the gray scale edges are detected and smoothed horizontally. Second, the edge image is binarized, and run length analysis is applied to find candidate text blocks. Finally, each detected block is verified by an improved logical level technique (ILLT). Experiments show this method is not sensitive to color/texture changes of the characters, and can be used to detect text lines in news videos effectively.
APA, Harvard, Vancouver, ISO, and other styles
37

Zhao, Liangbing, Zicheng Zhang, Xuecheng Nie, Luoqi Liu, and Si Liu. "Cross-Attention and Seamless Replacement of Latent Prompts for High-Definition Image-Driven Video Editing." Electronics 13, no. 1 (December 19, 2023): 7. http://dx.doi.org/10.3390/electronics13010007.

Full text
Abstract:
Recently, text-driven video editing has received increasing attention due to the surprising success of the text-to-image model in improving video quality. However, video editing based on the text prompt is facing huge challenges in achieving precise and controllable editing. Herein, we propose Latent prompt Image-driven Video Editing (LIVE) with a precise and controllable video editing function. The important innovation of LIVE is to utilize the latent codes from reference images as latent prompts to rapidly enrich visual details. The novel latent prompt mechanism endows two powerful capabilities for LIVE: one is a comprehensively interactive ability between video frame and latent prompt in the spatial and temporal dimensions, achieved by revisiting and enhancing cross-attention, and the other is the efficient expression ability of training continuous input videos and images within the diffusion space by fine-tuning various components such as latent prompts, textual embeddings, and LDM parameters. Therefore, LIVE can efficiently generate various edited videos with visual consistency by seamlessly replacing the objects in each frame with user-specified targets. The high-definition experimental results from real-world videos not only confirmed the effectiveness of LIVE but also demonstrated important potential application prospects of LIVE in image-driven video editing.
APA, Harvard, Vancouver, ISO, and other styles
38

Kim, Seon-Min, and Dae-Soo Cho. "Design and Implementation of a Summary Note System for Educational Video Contents." Journal of Computational and Theoretical Nanoscience 18, no. 5 (May 1, 2021): 1377–84. http://dx.doi.org/10.1166/jctn.2021.9624.

Full text
Abstract:
Nowadays, video content consumption through video platforms is growing exponentially. In the education field, although video content is used more often than text content, but there are certain challenges. In this paper, we propose a system that can easily and efficiently create summary notes for the video while playing educational content videos. The summary notes will include keyword-based text information, image information at specific points in the video, and video bookmark information based on video frame and play time. When playing a video, a user can repeat two kinds of events at the moment a specific scene appears, capture a particular video frame in the video, and generate a keyword. The system provides functions for text extraction and to input of user comments that help organize the video information so that summary notes can be created more conveniently. It can collectively organize the video information to lessen the user’s time summarizing video information. it is also possible to quickly search the contents of the summary note using keyword.
APA, Harvard, Vancouver, ISO, and other styles
39

Ahmad Hafidh Ayatullah and Nanik Suciati. "TOPIC GROUPING BASED ON DESCRIPTION TEXT IN MICROSOFT RESEARCH VIDEO DESCRIPTION CORPUS DATA USING FASTTEXT, PCA AND K-MEANS CLUSTERING." Jurnal Informatika Polinema 9, no. 2 (February 27, 2023): 223–28. http://dx.doi.org/10.33795/jip.v9i2.1271.

Full text
Abstract:
This research groups topics of the Microsoft Research Video Description Corpus (MRVDC) based on text descriptions of Indonesian language dataset. The Microsoft Research Video Description Corpus (MRVDC) is a video dataset developed by Microsoft Research, which contains paraphrased event expressions in English and other languages. The results of grouping these topics show how the patterns of similarity and interrelationships between text descriptions from different video data, which will be useful for the topic-based video retrieval. The topic grouping process is based on text descriptions using fastText as word embedding, PCA as features reduction method and K-means as the clustering method. The experiment on 1959 videos with 43753 text descriptions to vary the number of k and with/without PCA result that the optimal clustering number is 180 with silhouette coefficient of 0.123115.
APA, Harvard, Vancouver, ISO, and other styles
40

Lienhart, Rainer, and Wolfgang Effelsberg. "Automatic text segmentation and text recognition for video indexing." Multimedia Systems 8, no. 1 (January 1, 2000): 69–81. http://dx.doi.org/10.1007/s005300050006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Dian Cita Sari, Fahri Zalmi, and Akhyak. "Productivity of Online Learning Media in Education Management." Jurnal Prajaiswara 3, no. 1 (June 13, 2022): 1–8. http://dx.doi.org/10.55351/prajaiswara.v3i1.38.

Full text
Abstract:
Introduction/Main Objectives The productivity of online learning media for the implementation of learning has increased intensely since the implementation of the corona lockdown. This presents a challenge to improve learning by developing media so that students have more time to study independently during the lockdown period.: Background Problems Researchers compared the productivity of various learning media such as text, video, and video-text combinations.: Novelty: Statistical analysis using Mann Whitney showed that there were significant differences in knowledge retention between the text and video groups as well as between the video and text-video combination groups. Research Methods: This study was conducted using a quasi-experimental post-test method by comparing participants' retention before the intervention. Finding/Results: The study involved 60 participants who were randomly divided into 3 groups, text (guidebook), audiovisual (video), and text-video combination. Video and text prepared by Widyaiswara. The results showed that the average scores were 35.1 (text group), 58.1 (video group), and 85.2 (text-video combination group). Conclusion: use of multimedia can increase knowledge retention in the absorption of the material.
APA, Harvard, Vancouver, ISO, and other styles
42

Dian Cita Sari, Fahri Zalmi, and Akhyak. "Produktivitas Media Pembelajaran Daring dalam Manajemen Pendidikan." Jurnal Prajaiswara 3, no. 1 (June 13, 2022): 1–7. http://dx.doi.org/10.55351/jp.v3i1.38.

Full text
Abstract:
Introduction/Main Objectives The productivity of online learning media for the implementation of learning has increased intensely since the implementation of the corona lockdown. This presents a challenge to improve learning by developing media so that students have more time to study independently during the lockdown period.: Background Problems Researchers compared the productivity of various learning media such as text, video, and video-text combinations.: Novelty: Statistical analysis using Mann Whitney showed that there were significant differences in knowledge retention between the text and video groups as well as between the video and text-video combination groups. Research Methods: This study was conducted using a quasi-experimental post-test method by comparing participants' retention before the intervention. Finding/Results: The study involved 60 participants who were randomly divided into 3 groups, text (guidebook), audiovisual (video), and text-video combination. Video and text prepared by Widyaiswara. The results showed that the average scores were 35.1 (text group), 58.1 (video group), and 85.2 (text-video combination group). Conclusion: use of multimedia can increase knowledge retention in the absorption of the material.
APA, Harvard, Vancouver, ISO, and other styles
43

Mellynda, Nabila, and Yusuf Ramadhan Nasution. "Application of Text File Steganography on Video using Least Bit Significant (LSB) Method." SISTEMASI 13, no. 2 (March 1, 2024): 399. http://dx.doi.org/10.32520/stmsi.v13i2.3933.

Full text
Abstract:
In the era of rapid development of information technology, data security is a crucial aspect that must be considered. The transfer and storage of information involves various media, including text and video. Unfortunately, most data sent or stored tends to be vulnerable to security threats. The purpose of this research is to implement text file steganography on video using the Least Bit Significant (LSB) method that can be used to hide secret messages. The method used in this research are LSB and Waterfall development method. The results of the study concluded that the LSB algorithm can be used in the process of insertion and extraction of text messages on video, text messages in this study are text contained in a text file with the txt extension. Videos that have been inserted text messages will have an extension of mp4 and cannot be played on video file playback media, The application produced in this study can be run through a web browser application, The application in this study was built using Visual Studio Code software using the HTML and Javascript programming languages.
APA, Harvard, Vancouver, ISO, and other styles
44

Moon, Nazmun Nessa, Imrus Salehin, Masuma Parvin, Md Mehedi Hasan, Iftakhar Mohammad Talha, Susanta Chandra Debnath, Fernaz Narin Nur, and Mohd Saifuzzaman. "Natural language processing based advanced method of unnecessary video detection." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 6 (December 1, 2021): 5411. http://dx.doi.org/10.11591/ijece.v11i6.pp5411-5419.

Full text
Abstract:
<span>In this study we have described the process of identifying unnecessary video using an advanced combined method of natural language processing and machine learning. The system also includes a framework that contains analytics databases and which helps to find statistical accuracy and can detect, accept or reject unnecessary and unethical video content. In our video detection system, we extract text data from video content in two steps, first from video to MPEG-1 audio layer 3 (MP3) and then from MP3 to WAV format. We have used the text part of natural language processing to analyze and prepare the data set. We use both Naive Bayes and logistic regression classification algorithms in this detection system to determine the best accuracy for our system. In our research, our video MP4 data has converted to plain text data using the python advance library function. This brief study discusses the identification of unauthorized, unsocial, unnecessary, unfinished, and malicious videos when using oral video record data. By analyzing our data sets through this advanced model, we can decide which videos should be accepted or rejected for the further actions.</span>
APA, Harvard, Vancouver, ISO, and other styles
45

Peng, Bo, Xinyuan Chen, Yaohui Wang, Chaochao Lu, and Yu Qiao. "ConditionVideo: Training-Free Condition-Guided Video Generation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (March 24, 2024): 4459–67. http://dx.doi.org/10.1609/aaai.v38i5.28244.

Full text
Abstract:
Recent works have successfully extended large-scale text-to-image models to the video domain, producing promising results but at a high computational cost and requiring a large amount of video data. In this work, we introduce ConditionVideo, a training-free approach to text-to-video generation based on the provided condition, video, and input text, by leveraging the power of off-the-shelf text-to-image generation methods (e.g., Stable Diffusion). ConditionVideo generates realistic dynamic videos from random noise or given scene videos. Our method explicitly disentangles the motion representation into condition-guided and scenery motion components. To this end, the ConditionVideo model is designed with a UNet branch and a control branch. To improve temporal coherence, we introduce sparse bi-directional spatial-temporal attention (sBiST-Attn). The 3D control network extends the conventional 2D controlnet model, aiming to strengthen conditional generation accuracy by additionally leveraging the bi-directional frames in the temporal domain. Our method exhibits superior performance in terms of frame consistency, clip score, and conditional accuracy, outperforming other compared methods.
APA, Harvard, Vancouver, ISO, and other styles
46

Ma, Yue, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Xiu Li, and Qifeng Chen. "Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (March 24, 2024): 4117–25. http://dx.doi.org/10.1609/aaai.v38i5.28206.

Full text
Abstract:
Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human. Nevertheless, this task has been restricted by the absence of a comprehensive dataset featuring paired video-pose captions and the generative prior models for videos. In this work, we design a novel two-stage training scheme that can utilize easily obtained datasets (i.e., image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos. Specifically, in the first stage, only the keypoint image pairs are used only for a controllable text-to-image generation. We learn a zero-initialized convolutional encoder to encode the pose information. In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-frame self-attention blocks. Powered by our new designs, our method successfully generates continuously pose-controllable character videos while keeps the editing and concept composition ability of the pre-trained T2I model. The code and models are available on https://follow-your-pose.github.io/.
APA, Harvard, Vancouver, ISO, and other styles
47

Yao, Xinwei, Ohad Fried, Kayvon Fatahalian, and Maneesh Agrawala. "Iterative Text-Based Editing of Talking-Heads Using Neural Retargeting." ACM Transactions on Graphics 40, no. 3 (August 2021): 1–14. http://dx.doi.org/10.1145/3449063.

Full text
Abstract:
We present a text-based tool for editing talking-head video that enables an iterative editing workflow. On each iteration users can edit the wording of the speech, further refine mouth motions if necessary to reduce artifacts, and manipulate non-verbal aspects of the performance by inserting mouth gestures (e.g., a smile) or changing the overall performance style (e.g., energetic, mumble). Our tool requires only 2 to 3 minutes of the target actor video and it synthesizes the video for each iteration in about 40 seconds, allowing users to quickly explore many editing possibilities as they iterate. Our approach is based on two key ideas. (1) We develop a fast phoneme search algorithm that can quickly identify phoneme-level subsequences of the source repository video that best match a desired edit. This enables our fast iteration loop. (2) We leverage a large repository of video of a source actor and develop a new self-supervised neural retargeting technique for transferring the mouth motions of the source actor to the target actor. This allows us to work with relatively short target actor videos, making our approach applicable in many real-world editing scenarios. Finally, our, refinement and performance controls give users the ability to further fine-tune the synthesized results.
APA, Harvard, Vancouver, ISO, and other styles
48

Milstein, Renee H. "Univision In-Text Video: Video Selection and !Lengua Viva! Exercises." Hispania 77, no. 3 (September 1994): 496. http://dx.doi.org/10.2307/344981.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Huang, Chunneng, Tianjun Fu, and Hsinchun Chen. "Text-based video content classification for online video-sharing sites." Journal of the American Society for Information Science and Technology 61, no. 5 (January 29, 2010): 891–906. http://dx.doi.org/10.1002/asi.21291.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Guo, Sheng Nan, and Chao Hui Jiang. "Research of Caption Text Security Detection in Video Based on Semantic Orientation." Applied Mechanics and Materials 568-570 (June 2014): 753–58. http://dx.doi.org/10.4028/www.scientific.net/amm.568-570.753.

Full text
Abstract:
With the rapid development of Internet technologies and applications, a lot of harmful video have spread out on Internet, it is enormously harmful to stability of social and people's physical and mental health. The means to extract video caption text is studied in this paper, and improvement method of text security detection. The proposed method first classifies caption text, then compares the result of classification with the library of user’s demands to determine whether to trigger alarms, through which the aim of monitoring harmful videos could be achieved. In this method, the text detection manner calculate the polarity of sentiment words by analyzing the context of those, meanwhile considers the effect of noun, then gets the orientation of the whole texts. Experiment has shown the method can monitor harmful video effectively.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography