Auswahl der wissenschaftlichen Literatur zum Thema „Video text“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Video text" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Video text"

1

V, Divya, Prithica G und Savija J. „Text Summarization for Education in Vernacular Languages“. International Journal for Research in Applied Science and Engineering Technology 11, Nr. 7 (31.07.2023): 175–78. http://dx.doi.org/10.22214/ijraset.2023.54589.

Der volle Inhalt der Quelle
Annotation:
Abstract: This project proposes a video summarizing system based on natural language processing (NLP) and Machine Learning to summarize the YouTube video transcripts without losing the key elements. The quantity of videos available on web platforms is steadily expanding. The content is made available globally, primarily for educational purposes. Additionally, educational content is available on YouTube, Facebook, Google, and Instagram. A significant issue of extracting information from videos is that unlike an image, where data can be collected from a single frame, a viewer must watch the entire video to grasp the context. This study aims to shorten the length ofthe transcript of the given video. The suggested method involves retrieving transcripts from the video link provided by the user and then summarizing the by using Hugging Face Transformers and Pipelining. The built model accepts video links and the required summary duration as input from the user and generates a summarized transcript as output. According to the results, the final translated was obtained in less time when compared with other proposed techniques. Furthermore, the video’s central concept is accurately present in the final without any deviations.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Rachidi, Youssef. „Text Detection in Video for Video Indexing“. International Journal of Computer Trends and Technology 68, Nr. 4 (25.04.2020): 96–99. http://dx.doi.org/10.14445/22312803/ijctt-v68i4p117.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Yariv, Guy, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz und Yossi Adi. „Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 7 (24.03.2024): 6639–47. http://dx.doi.org/10.1609/aaai.v38i7.28486.

Der volle Inhalt der Quelle
Annotation:
We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresponding segment of that video. We utilize an existing text-conditioned video generation model and a pre-trained audio encoder model. The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model. As such, it also enables video generation conditioned on text, audio, and, for the first time as far as we can ascertain, on both text and audio. We validate our method extensively on three datasets demonstrating significant semantic diversity of audio-video samples and further propose a novel evaluation metric (AV-Align) to assess the alignment of generated videos with input audio samples. AV-Align is based on the detection and comparison of energy peaks in both modalities. In comparison to recent state-of-the-art approaches, our method generates videos that are better aligned with the input sound, both with respect to content and temporal axis. We also show that videos produced by our method present higher visual quality and are more diverse. Code and samples are available at: https://pages.cs.huji.ac.il/adiyoss-lab/TempoTokens/.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Chiu, Chih-Yi, Po-Chih Lin, Sheng-Yang Li, Tsung-Han Tsai und Yu-Lung Tsai. „Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment“. IEEE Transactions on Circuits and Systems for Video Technology 22, Nr. 7 (Juli 2012): 999–1013. http://dx.doi.org/10.1109/tcsvt.2012.2189478.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Jiang, Ai Wen, und Gao Rong Zeng. „Multi-information Integrated Method for Text Extraction from Videos“. Advanced Materials Research 225-226 (April 2011): 827–30. http://dx.doi.org/10.4028/www.scientific.net/amr.225-226.827.

Der volle Inhalt der Quelle
Annotation:
Video text provides important semantic information in video content analysis. However, video text with complex background has a poor recognition performance for OCR. Most of the previous approaches to extracting overlay text from videos are based on traditional binarization and give little attention on multi-information integration, especially fusing the background information. This paper presents an effective method to precisely extract characters from videos to enable it for OCR with a good recognition performance. The proposed method combines multi-information together including background information, edge information, and character’s spatial information. Experimental results show that it is robust to complex background and various text appearances.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Ma, Fan, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu und Yi Yang. „Stitching Segments and Sentences towards Generalization in Video-Text Pre-training“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 5 (24.03.2024): 4080–88. http://dx.doi.org/10.1609/aaai.v38i5.28202.

Der volle Inhalt der Quelle
Annotation:
Video-language pre-training models have recently achieved remarkable results on various multi-modal downstream tasks. However, most of these models rely on contrastive learning or masking modeling to align global features across modalities, neglecting the local associations between video frames and text tokens. This limits the model’s ability to perform fine-grained matching and generalization, especially for tasks that selecting segments in long videos based on query texts. To address this issue, we propose a novel stitching and matching pre-text task for video-language pre-training that encourages fine-grained interactions between modalities. Our task involves stitching video frames or sentences into longer sequences and predicting the positions of cross-model queries in the stitched sequences. The individual frame and sentence representations are thus aligned via the stitching and matching strategy, encouraging the fine-grained interactions between videos and texts. in the stitched sequences for the cross-modal query. We conduct extensive experiments on various benchmarks covering text-to-video retrieval, video question answering, video captioning, and moment retrieval. Our results demonstrate that the proposed method significantly improves the generalization capacity of the video-text pre-training models.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Cao, Shuqiang, Bairui Wang, Wei Zhang und Lin Ma. „Visual Consensus Modeling for Video-Text Retrieval“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 1 (28.06.2022): 167–75. http://dx.doi.org/10.1609/aaai.v36i1.19891.

Der volle Inhalt der Quelle
Annotation:
In this paper, we propose a novel method to mine the commonsense knowledge shared between the video and text modalities for video-text retrieval, namely visual consensus modeling. Different from the existing works, which learn the video and text representations and their complicated relationships solely based on the pairwise video-text data, we make the first attempt to model the visual consensus by mining the visual concepts from videos and exploiting their co-occurrence patterns within the video and text modalities with no reliance on any additional concept annotations. Specifically, we build a shareable and learnable graph as the visual consensus, where the nodes denoting the mined visual concepts and the edges connecting the nodes representing the co-occurrence relationships between the visual concepts. Extensive experimental results on the public benchmark datasets demonstrate that our proposed method, with the ability to effectively model the visual consensus, achieves state-of-the-art performances on the bidirectional video-text retrieval task. Our code is available at https://github.com/sqiangcao99/VCM.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Bodyanskaya, Alisa, und Kapitalina Sinegubova. „Music Video as a Poetic Interpretation“. Virtual Communication and Social Networks 2023, Nr. 2 (25.04.2023): 47–55. http://dx.doi.org/10.21603/2782-4799-2023-2-2-47-55.

Der volle Inhalt der Quelle
Annotation:
This article introduces the phenomenon of videopoetry as a hybrid product of mass media whose popularity is based on intermediality, i.e., the cumulative effect on different perception channels. Videopoetry is a productive form of verbal creativity in the contemporary media culture with its active reception of art. The research featured poems by W. B. Yeats, T. S. Eliot, and W. H. Auden presented as videos and the way they respond to someone else's poetic word. The authors analyzed 15 videos by comparing the original text and the video sequence in line with the method developed by N. V. Barkovskaya and A. A. Zhitenev. The analysis revealed several options for relaying a poetic work as a music video. Three videos provided a direct illustration of the source text, suggesting a complete or partial visual duplication of the original poetic imagery. Five videos offered an indirect illustration of the source text by using associative images in relation to the central images of the poem. Five videos gave a minimal illustration: the picture did not dominate the text of the poem, but its choice implied a certain interpretation. Two videos featured the video maker as a reciter. The video makers did not try to transform the poetic text but used the video sequence as a way to enter into a dialogue with the original poem or resorted to indirect illustration to generate occasional meanings. Thus, video makers keep the original text unchanged and see the video sequence and musical accompaniment as their responsibility but maintain a dialogue between the original text and its game reinterpretation.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Ghorpade, Jayshree, Raviraj Palvankar, Ajinkya Patankar und Snehal Rathi. „Extracting Text from Video“. Signal & Image Processing : An International Journal 2, Nr. 2 (30.06.2011): 103–12. http://dx.doi.org/10.5121/sipij.2011.2209.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Wadaskar, Ghanshyam, Sanghdip Udrake, Vipin Bopanwar, Shravani Upganlawar und Prof Minakshi Getkar. „Extract Text from Video“. International Journal for Research in Applied Science and Engineering Technology 12, Nr. 5 (31.05.2024): 2881–83. http://dx.doi.org/10.22214/ijraset.2024.62287.

Der volle Inhalt der Quelle
Annotation:
Abstract: The code import the YoutubeTranscriptionApi from youtube_transcription_api libray, The YouTube video ID is defined. The transcription data for the given video ID is fetched using get_transcription method. The transcription text is extracted from the data and stroed in the transcription variable. The transcriptipn is split into lines and thenjoined back into a single string. Finally the processed transcript is writen into a text file name “Love.text” with UTF-8 encoding. The commented-out code block is an alternative way to write the transcript into a text file using the open function directly. which you can use if you prefer.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema "Video text"

1

Sidevåg, Emmilie. „Användarmanual text vs video“. Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-17617.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Salway, Andrew. „Video annotation : the role of specialist text“. Thesis, University of Surrey, 1999. http://epubs.surrey.ac.uk/843350/.

Der volle Inhalt der Quelle
Annotation:
Digital video is among the most information-intensive modes of communication. The retrieval of video from digital libraries, along with sound and text, is a major challenge for the computing community in general and for the artificial intelligence community specifically. The advent of digital video has set some old questions in a new light. Questions relating to aesthetics and to the role of surrogates - image for reality and text for image, invariably touch upon the link between vision and language. Dealing with this link computationally is important for the artificial intelligence enterprise. Interesting images to consider both aesthetically and for research in video retrieval include those which are constrained and patterned, and which convey rich meanings; for example, dance. These are specialist images for us and require a special language for description and interpretation. Furthermore, they require specialist knowledge to be understood since there is usually more than meets the untrained eye: this knowledge may also be articulated in the language of the specialism. In order to be retrieved effectively and efficiently, video has to be annotated-, particularly so for specialist moving images. Annotation involves attaching keywords from the specialism along with, for us, commentaries produced by experts, including those written and spoken specifically for annotation and those obtained from a corpus of extant texts. A system that processes such collateral text for video annotation should perhaps be grounded in an understanding of the link between vision and language. This thesis attempts to synthesise ideas from artificial intelligence, multimedia systems, linguistics, cognitive psychology and aesthetics. The link between vision and language is explored by focusing on moving images of dance and the special language used to describe and interpret them. We have developed an object-oriented system, KAB, which helps to annotate a digital video library with a collateral corpus of texts and terminology. User evaluation has been encouraging. The system is now available on the WWW.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Smith, Gregory. „VIDEO SCENE DETECTION USING CLOSED CAPTION TEXT“. VCU Scholars Compass, 2009. http://scholarscompass.vcu.edu/etd/1932.

Der volle Inhalt der Quelle
Annotation:
Issues in Automatic Video Biography Editing are similar to those in Video Scene Detection and Topic Detection and Tracking (TDT). The techniques of Video Scene Detection and TDT can be applied to interviews to reduce the time necessary to edit a video biography. The system has attacked the problems of extraction of video text, story segmentation, and correlation. This thesis project was divided into three parts: extraction, scene detection, and correlation. The project successfully detected scene breaks in series television episodes and displayed scenes that had similar content.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Zhang, Jing. „Extraction of Text Objects in Image and Video Documents“. Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4266.

Der volle Inhalt der Quelle
Annotation:
The popularity of digital image and video is increasing rapidly. To help users navigate libraries of image and video, Content Based Information Retrieval (CBIR) system that can automatically index image and video documents are needed. However, due to the semantic gap between low-level machine descriptors and high-level semantic descriptors, the existing CBIR systems are still far from perfect. Text embedded in multi-media data, as a well-defined model of concepts for humans' communication, contains much semantic information related to the content. This text information can provide a much truer form of content-based access to the image and video documents if it can be extracted and harnessed efficiently. This dissertation solves the problem involved in detecting text object in image and video and tracking text event in video. For text detection problem, we propose a new unsupervised text detection algorithm. A new text model is constructed to describe text object using pictorial structure. Each character is a part in the model and every two neighboring characters are connected by a spring-like link. Two characters and the link connecting them are defined as a text unit. We localize candidate parts by extracting closed boundaries and initialize the links by connecting two neighboring candidate parts based on the spatial relationship of characters. For every candidate part, we compute character energy using three new character features, averaged angle difference of corresponding pairs, fraction of non-noise pairs, and vector of stroke width. They are extracted based on our observation that the edge of a character can be divided into two sets with high similarities in length, curvature, and orientation. For every candidate link, we compute link energy based on our observation that the characters of a text typically align along certain direction with similar color, size, and stroke width. For every candidate text unit, we combine character and link energies to compute text unit energy which indicates the probability that the candidate text model is a real text object. The final text detection results are generated using a text unit energy based thresholding. For text tracking problem, we construct a text event model by using pictorial structure as well. In this model, the detected text object in each video frame is a part and two neighboring text objects of a text event are connected by a spring-like link. Inter-frame link energy is computed for each link based on the character energy, similarity of neighboring text objects, and motion information. After refining the model using inter-frame link energy, the remaining text event models are marked as text events. At character level, because the proposed method is based on the assumption that the strokes of a character have uniform thickness, it can detect and localize characters from different languages in different styles, such as typewritten text or handwriting text, if the characters have approximately uniform stroke thickness. At text level, however, because the spatial relationship between two neighboring characters is used to localize text objects, the proposed method may fail to detect and localize the characters with multiple separate strokes or connected characters. For example, some East Asian language characters, such as Chinese, Japanese, and Korean, have many strokes of a single character. We need to group the strokes first to form single characters and then group characters to form text objects. While, the characters of some languages, such Arabic and Hindi, are connected together, we cannot extract spatial information between neighboring characters since they are detected as a single character. Therefore, in current stage the proposed method can detect and localize the text objects that are composed of separate characters with connected strokes with approximately uniform thickness. We evaluated our method comprehensively using three English language-based image and video datasets: ICDAR 2003/2005 text locating dataset (258 training images and 251 test images), Microsoft Street View text detection dataset (307 street view images), and VACE video dataset (50 broadcast news videos from CNN and ABC). The experimental results demonstrate that the proposed text detection method can capture the inherent properties of text and discriminate text from other objects efficiently.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Sjölund, Jonathan. „Detection of Frozen Video Subtitles Using Machine Learning“. Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158239.

Der volle Inhalt der Quelle
Annotation:
When subtitles are burned into a video, an error can sometimes occur in the encoder that results in the same subtitle being burned into several frames, resulting in subtitles becoming frozen. This thesis provides a way to detect frozen video subtitles with the help of an implemented text detector and classifier. Two types of classifiers, naïve classifiers and machine learning classifiers, are tested and compared on a variety of different videos to see how much a machine learning approach can improve the performance. The naïve classifiers are evaluated using ground truth data to gain an understanding of the importance of good text detection. To understand the difficulty of the problem, two different machine learning classifiers are tested, logistic regression and random forests. The result shows that machine learning improves the performance over using naïve classifiers by improving the specificity from approximately 87.3% to 95.8% and improving the accuracy from 93.3% to 95.5%. Random forests achieve the best overall performance, but the difference compared to when using logistic regression is small enough that more computationally complex machine learning classifiers are not necessary. Using the ground truth shows that the weaker naïve classifiers would be improved by at least 4.2% accuracy, thus a better text detector is warranted. This thesis shows that machine learning is a viable option for detecting frozen video subtitles.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Chen, Datong. „Text detection and recognition in images and video sequences /“. [S.l.] : [s.n.], 2003. http://library.epfl.ch/theses/?display=detail&nr=2863.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Štindlová, Marie. „Museli to založit“. Master's thesis, Vysoké učení technické v Brně. Fakulta výtvarných umění, 2015. http://www.nusl.cz/ntk/nusl-232451.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Bird, Paul. „Elementary students' comprehension of computer presented text“. Thesis, University of British Columbia, 1990. http://hdl.handle.net/2429/29187.

Der volle Inhalt der Quelle
Annotation:
The study investigated grade 6 students' comprehension of narrative text when presented on a computer and as printed words on paper. A set of comprehension tests were developed for three stories of varying length (382 words, 1047 words and 1933 words) using a skills hierarchy protocol. The text for each story was prepared for presentation on a Macintosh computer using a program written for the study and as print in the form of exact copies of the computer screen. Students from two grade 6 classes in a suburban elementary school were randomly assigned to read one of the stories in either print form or on the computer and subsequently completed a comprehension test as well as a questionnaire concerning attitude and personal information. The responses from the comprehension tests were evaluated by graduate students in Language Education. The data evolved from the tests and questionnaires were analysed to determine measures of test construct validity, inter-rater reliability, and any significant difference in the means of comprehension scores for the two experimental groups for each story. The results indicated small but insignificant differences between the means of the three comprehension test scores for computer and print. A number of students reading from the computer complained of eye fatigue. The scores of subjects reading the longest story and complaining of eye fatigue were significantly lower.
Education, Faculty of
Curriculum and Pedagogy (EDCP), Department of
Graduate
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Sharma, Nabin. „Multi-lingual Text Processing from Videos“. Thesis, Griffith University, 2015. http://hdl.handle.net/10072/367489.

Der volle Inhalt der Quelle
Annotation:
Advances in digital technology have produced low priced portable imaging devices such as digital cameras attached to mobile phones, camcorders, PDA’s etc. which are highly portable. These devices can be used to capture videos and images at ease, which can be shared through the internet and other communication media. In the commercial do- main, cameras are used to create news, advertisement videos and other forms of material for information communication. The use of multiple languages to create information for targeted audiences is quite common in countries having multiple official languages. Trans- mission of news, advertisement videos and images across various communication channels has created large databases of videos and these are increasing exponentially. Effective management of such databases requires proper indexing for the retrieval of relevant in- formation. Text information is dominant in most of the videos and images, which can be used as keywords for retrieval of relevant video and images. Automatic annotation of videos and images to extract keywords requires the text to be converted to an editable form. This thesis addresses the problem of multi-lingual text processing from video frames. Multi-lingual text processing involves text detection, word segmentation, script identification, and text recognition. Additionally, text frame classification is required to avoid processing a video frame which does not contain text information. A new multi-lingual video word dataset was created and published as a part of the current research. The dataset comprises words of ten scripts, namely English (Roman), Hindi (Devanagari), Bengali (Bangla), Arabic, Oriya, Gujrathi, Punjabi, Kannada, Tamil and Telugu. This dataset was created to facilitate future research on multi-lingual text recognition.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology.
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Fraz, Muhammad. „Video content analysis for intelligent forensics“. Thesis, Loughborough University, 2014. https://dspace.lboro.ac.uk/2134/18065.

Der volle Inhalt der Quelle
Annotation:
The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bücher zum Thema "Video text"

1

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan und Wenyin Liu. Video Text Detection. London: Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

author, Wilde Rod, Hrsg. Volleyball essentials: Video-text. Oslo, Norway: Total Health Publications, 2014.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Shivakumara, Palaiahnakote, und Umapada Pal. Cognitively Inspired Video Text Processing. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-7069-5.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Barry, Atkins, und Krzywinska Tanya, Hrsg. Videogame, player, text. Manchester, UK: Manchester University Press, 2007.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Peterson, Tara. Should kids play video games?: A persuasive text. New York: Mondo, 2006.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

The practice of mediation: A video-integrated text. New York, NY: Aspen Publishers, 2008.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

H, Stark James, Hrsg. The practice of mediation: A video-integrated text. 2. Aufl. New York: Wolters Kluwer Law & Business, 2012.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Societats virtuals: Gamer's edition = Virtual societies : gamer's edition /c [text, Ricard Mas Peinado]. Barcelona: Arts Santa Monica, 2010.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Chen, Datong. Text detection and recognition in images and video sequences. Lausanne: EPFL, 2003.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Szuprowicz, Bohdan O. Multimedia technology: Combining sound, text, computing, graphics, and video. Charleston, S.C., U.S.A: Computer Technology Research Corp., 1992.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Video text"

1

Weik, Martin H. „video text“. In Computer Science and Communications Dictionary, 1892. Boston, MA: Springer US, 2000. http://dx.doi.org/10.1007/1-4020-0613-6_20796.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan und Wenyin Liu. „Video Preprocessing“. In Video Text Detection, 19–47. London: Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_2.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Shivakumara, Palaiahnakote, und Umapada Pal. „Video Text Recognition“. In Cognitive Intelligence and Robotics, 233–71. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-7069-5_9.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Shivakumara, Palaiahnakote, und Umapada Pal. „Video Text Detection“. In Cognitive Intelligence and Robotics, 61–94. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-7069-5_4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan und Wenyin Liu. „Video Caption Detection“. In Video Text Detection, 49–80. London: Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_3.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan und Wenyin Liu. „Video Text Detection Systems“. In Video Text Detection, 169–93. London: Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_7.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan und Wenyin Liu. „Introduction to Video Text Detection“. In Video Text Detection, 1–18. London: Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_1.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan und Wenyin Liu. „Performance Evaluation“. In Video Text Detection, 247–54. London: Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_10.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan und Wenyin Liu. „Text Detection from Video Scenes“. In Video Text Detection, 81–126. London: Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan und Wenyin Liu. „Post-processing of Video Text Detection“. In Video Text Detection, 127–44. London: Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_5.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Video text"

1

Zu, Xinyan, Haiyang Yu, Bin Li und Xiangyang Xue. „Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning“. In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/206.

Der volle Inhalt der Quelle
Annotation:
Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For ‘visually’, we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For ‘linguistically’, a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For ‘semantically’, we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Kim, Jonghee, Youngwan Lee und Jinyoung Moon. „T2V2T: Text-to-Video-to-Text Fusion for Text-to-Video Retrieval“. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2023. http://dx.doi.org/10.1109/cvprw59228.2023.00594.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Denoue, Laurent, Scott Carter und Matthew Cooper. „Video text retouch“. In the adjunct publication of the 27th annual ACM symposium. New York, New York, USA: ACM Press, 2014. http://dx.doi.org/10.1145/2658779.2659102.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Deng, Kangle, Tianyi Fei, Xin Huang und Yuxin Peng. „IRC-GAN: Introspective Recurrent Convolutional GAN for Text-to-video Generation“. In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/307.

Der volle Inhalt der Quelle
Annotation:
Automatically generating videos according to the given text is a highly challenging task, where visual quality and semantic consistency with captions are two critical issues. In existing methods, when generating a specific frame, the information in those frames generated before is not fully exploited. And an effective way to measure the semantic accordance between videos and captions remains to be established. To address these issues, we present a novel Introspective Recurrent Convolutional GAN (IRC-GAN) approach. First, we propose a recurrent transconvolutional generator, where LSTM cells are integrated with 2D transconvolutional layers. As 2D transconvolutional layers put more emphasis on the details of each frame than 3D ones, our generator takes both the definition of each video frame and temporal coherence across the whole video into consideration, and thus can generate videos with better visual quality. Second, we propose mutual information introspection to semantically align the generated videos to text. Unlike other methods simply judging whether the video and the text match or not, we further take mutual information to concretely measure the semantic consistency. In this way, our model is able to introspect the semantic distance between the generated video and the corresponding text, and try to minimize it to boost the semantic consistency.We conduct experiments on 3 datasets and compare with state-of-the-art methods. Experimental results demonstrate the effectiveness of our IRC-GAN to generate plausible videos from given text.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Feng, Zerun, Zhimin Zeng, Caili Guo und Zheng Li. „Exploiting Visual Semantic Reasoning for Video-Text Retrieval“. In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/140.

Der volle Inhalt der Quelle
Annotation:
Video retrieval is a challenging research topic bridging the vision and language areas and has attracted broad attention in recent years. Previous works have been devoted to representing videos by directly encoding from frame-level features. In fact, videos consist of various and abundant semantic relations to which existing methods pay less attention. To address this issue, we propose a Visual Semantic Enhanced Reasoning Network (ViSERN) to exploit reasoning between frame regions. Specifically, we consider frame regions as vertices and construct a fully-connected semantic correlation graph. Then, we perform reasoning by novel random walk rule-based graph convolutional networks to generate region features involved with semantic relations. With the benefit of reasoning, semantic interactions between regions are considered, while the impact of redundancy is suppressed. Finally, the region features are aggregated to form frame-level features for further encoding to measure video-text similarity. Extensive experiments on two public benchmark datasets validate the effectiveness of our method by achieving state-of-the-art performance due to the powerful semantic reasoning.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Balaji, Yogesh, Martin Renqiang Min, Bing Bai, Rama Chellappa und Hans Peter Graf. „Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis“. In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/276.

Der volle Inhalt der Quelle
Annotation:
Developing conditional generative models for text-to-video synthesis is an extremely challenging yet an important topic of research in machine learning. In this work, we address this problem by introducing Text-Filter conditioning Generative Adversarial Network (TFGAN), a conditional GAN model with a novel multi-scale text-conditioning scheme that improves text-video associations. By combining the proposed conditioning scheme with a deep GAN architecture, TFGAN generates high quality videos from text on challenging real-world video datasets. In addition, we construct a synthetic dataset of text-conditioned moving shapes to systematically evaluate our conditioning scheme. Extensive experiments demonstrate that TFGAN significantly outperforms existing approaches, and can also generate videos of novel categories not seen during training.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Hamaguchi, Narichika, Mamoru Doke, Masaki Hayashi und Nobuyuki Yagi. „Text-based video blogging“. In the 15th international conference. New York, New York, USA: ACM Press, 2006. http://dx.doi.org/10.1145/1135777.1135970.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Zhang, Chi, Guixuan Zhang und Shuwu Zhang. „Text Based Video Retrieval among Video Clips“. In 2021 International Conference on Culture-oriented Science & Technology (ICCST). IEEE, 2021. http://dx.doi.org/10.1109/iccst53801.2021.00099.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Jahagirdar, Soumya, Minesh Mathew, Dimosthenis Karatzas und C. V. Jawahar. „Understanding Video Scenes through Text: Insights from Text-based Video Question Answering“. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE, 2023. http://dx.doi.org/10.1109/iccvw60793.2023.00500.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Huang, Xiaodong, Qin Wang, Lishang Zhu und Kehua Liu. „Video text detection based on text edge map“. In 2013 3rd International Conference on Computer Science and Network Technology (ICCSNT). IEEE, 2013. http://dx.doi.org/10.1109/iccsnt.2013.6967273.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Berichte der Organisationen zum Thema "Video text"

1

Li, Huiping, David Doermann und Omid Kia. Automatic Text Detection and Tracking in Digital Video. Fort Belvoir, VA: Defense Technical Information Center, Dezember 1998. http://dx.doi.org/10.21236/ada458675.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Kuzmin, Vyacheslav, Alebai Sabitov, Andrei Reutov, Vladimir Amosov, Lidiia Neupokeva und Igor Chernikov. Electronic training manual "Providing first aid to the population". SIB-Expertise, Januar 2024. http://dx.doi.org/10.12731/er0774.29012024.

Der volle Inhalt der Quelle
Annotation:
First aid represents the simplest urgent measures necessary to save the lives of victims of injuries, accidents and sudden illnesses. Providing first aid greatly increases the chances of salvation in case of bleeding, injury, cardiac and respiratory arrest, and prevents complications such as shock, massive blood loss, additional displacement of bone fragments and injury to large nerve trunks and blood vessels. This electronic educational resourse consists of four theoretical educational modules: legal aspects of providing first aid to victims and work safety when providing first aid; providing first aid in critical conditions of the body; providing first aid for injuries of various origins; providing first aid in case of extreme exposures, accidents and poisonings. The electronic educational resource materials include 8 emergency conditions and 11 life-saving measures. The theoretical block of modules is presented by presentations, the text of lectures with illustrations, a video film and video lectures. Control classes in the form of test control accompany each theoretical module. After studying all modules, the student passes the final test control. Mastering the electronic manual will ensure a high level of readiness to provide first aid to persons without medical education.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Sharova, Iryna. WAYS OF PROMOTING UKRANIAN PUBLISHING HOUSES ON FACEBOOK DURING QUARANTINE. Ivan Franko National University of Lviv, Februar 2021. http://dx.doi.org/10.30970/vjo.2021.49.11076.

Der volle Inhalt der Quelle
Annotation:
The article reviews and analyzes the promotion of Ukrainian publishing houses on Facebook during quarantine in 2020. The study’s main objective is content and its types, which were used for representing on Facebook. We found out that going live and posting a text with a picture was most popular. The phenomenon of live video is tightly connected to the quarantine phenomenon. Though, not every publishing house was able to go live permanently or at least regular. However, simple text with a picture is the most uncomplicated content to post and the most popular. Ukrainian publishers also use UGC (User Generated Content), situational content, and different contexts. The biggest problem for Ukrainian publishers is continual strategic work with social media for promotion. During quarantine, social media became the first channel for communication with customers and subscribers. Therefore promotion on the Internet and in social media indeed should become equivalent to offline promotion.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Baluk, Nadia, Natalia Basij, Larysa Buk und Olha Vovchanska. VR/AR-TECHNOLOGIES – NEW CONTENT OF THE NEW MEDIA. Ivan Franko National University of Lviv, Februar 2021. http://dx.doi.org/10.30970/vjo.2021.49.11074.

Der volle Inhalt der Quelle
Annotation:
The article analyzes the peculiarities of the media content shaping and transformation in the convergent dimension of cross-media, taking into account the possibilities of augmented reality. With the help of the principles of objectivity, complexity and reliability in scientific research, a number of general scientific and special methods are used: method of analysis, synthesis, generalization, method of monitoring, observation, problem-thematic, typological and discursive methods. According to the form of information presentation, such types of media content as visual, audio, verbal and combined are defined and characterized. The most important in journalism is verbal content, it is the one that carries the main information load. The dynamic development of converged media leads to the dominance of image and video content; the likelihood of increasing the secondary content of the text increases. Given the market situation, the effective information product is a combined content that combines text with images, spreadsheets with video, animation with infographics, etc. Increasing number of new media are using applications and website platforms to interact with recipients. To proceed, the peculiarities of the new content of new media with the involvement of augmented reality are determined. Examples of successful interactive communication between recipients, the leading news agencies and commercial structures are provided. The conditions for effective use of VR / AR-technologies in the media content of new media, the involvement of viewers in changing stories with augmented reality are determined. The so-called immersive effect with the use of VR / AR-technologies involves complete immersion, immersion of the interested audience in the essence of the event being relayed. This interaction can be achieved through different types of VR video interactivity. One of the most important results of using VR content is the spatio-temporal and emotional immersion of viewers in the plot. The recipient turns from an external observer into an internal one; but his constant participation requires that the user preferences are taken into account. Factors such as satisfaction, positive reinforcement, empathy, and value influence the choice of VR / AR content by viewers.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Krull, R. 8mm video tape test. Office of Scientific and Technical Information (OSTI), Oktober 1990. http://dx.doi.org/10.2172/6375254.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Prudkov, Mikhail, Vasily Ermolaev, Elena Shurygina und Eduard Mikaelyan. Electronic educational resource "Hospital Surgery for 5th year students of the Faculty of Pediatrics". SIB-Expertise, Januar 2024. http://dx.doi.org/10.12731/er0780.29012024.

Der volle Inhalt der Quelle
Annotation:
Electronic educational resourc was created for independent work of 5th year students of the pediatric faculty in the study of the discipline "Hospital Surgery". The possibility of control by the teacher is provided. This EER includes an introductory module, a topic module, and a quality assessment module. The structure of each topic in the EER (there are 19 topics in total) consists of the following sections: educational and methodological tasks on the topic, abstract of the topic, control tests on the topic, clinical situational tasks on the topic and a list of references. The section "Summary of the topic" at the moment can be presented in the form of a text file, or a presentation, or a video lecture, or a monograph by the staff of the department, etc. This section is gradually updated with new materials. The section "Control tests on the topic" is designed to control the teacher for the independent work of students and contains 15 tests, the solution of which is given 10 minutes and two attempts, a passing result for crediting 71% of correct answers. The section "Clinical situational tasks" serves for self-control of the student in mastering the topic - if he understood the content of the task, made a preliminary diagnosis and knows the tactics of managing the patient, the topic is mastered. There are ten clinical situational tasks for each topic, students receive different versions of tasks. In addition, the EER has a "Final test control" section, which contains test tasks from all topics of practical classes. The program randomly generates a final test of 30 tasks, 20 minutes are allotted for solving, the student has the right to two attempts. More than 71% of correct answers are counted.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Felix, Juri, und Laura Webb. Use of artificial intelligence in education delivery and assessment. Parliamentary Office of Science and Technology, Januar 2024. http://dx.doi.org/10.58248/pn712.

Der volle Inhalt der Quelle
Annotation:
This POSTnote considers how artificial intelligence (AI) technologies can be used by educators and learners in schools, colleges and universities. Artificial intelligence technologies that can be used in education have developed rapidly in recent years. This has been driven in part by advancements of generative AI, which is now capable of performing a wide range of tasks including the production of realistic content such as text, images, audio and video. Artificial intelligence tools have the potential to provide different ways of learning and to help educators with lesson planning, marking and other tasks. However, adoption of AI in education is still in an early and experimental phase. There is uncertainty about the benefits and limitations. Some stakeholders have expressed concerns that over-reliance on AI could diminish educator-learner relationships. Concerns also relate to potential negative impacts on learners’ writing and critical thinking skills, through work being undertaken by AI. In November 2023, the Department for Education published a report on the use of Generative AI in education. The UK Government have also announced an investment of up to £2 million to provide new AI-powered resources for teachers in England.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Rekstad, Gary. Development of a Video Tape to Test Video Codecs Operating at 64KBPS. Fort Belvoir, VA: Defense Technical Information Center, Februar 1989. http://dx.doi.org/10.21236/ada228157.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Crandall, Rob. Airborne Separation Video System Government Suitability Test. Fort Belvoir, VA: Defense Technical Information Center, Juni 1999. http://dx.doi.org/10.21236/ada368478.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Schneider, S., und K. Insch. Modular Integrated Video System (MIVS) environmental test report. Office of Scientific and Technical Information (OSTI), Oktober 1989. http://dx.doi.org/10.2172/7172843.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie