Auswahl der wissenschaftlichen Literatur zum Thema „Video text“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Video text" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Video text"

1

Huang, Bin, Xin Wang, Hong Chen, Houlun Chen, Yaofei Wu, and Wenwu Zhu. "Identity-Text Video Corpus Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 4 (2025): 3608–16. https://doi.org/10.1609/aaai.v39i4.32375.

Der volle Inhalt der Quelle
Annotation:
Video corpus grounding (VCG), which aims to retrieve relevant video moments from a video corpus, has attracted significant attention in the multimedia research community. However, the existing VCG setting primarily focuses on matching textual descriptions with videos and ignores the distinct visual identities in the videos, thus resulting in inaccurate understanding of video content and deteriorated retrieval performances. To address this limitation, we introduce a novel task, Identity-Text Video Corpus Grounding (ITVCG), which simultaneously utilize textual descriptions and visual identities as queries. As such, ITVCG benefits in enabling more accurate video corpus grounding with visual identities, as well as providing users with more flexible options to locate relevant frames based on either textual descriptions or textual descriptions and visual identities. To conduct evaluations regarding the novel ITVCG task, we propose the TVR-IT dataset, comprising 463 identity images from 6 TV shows, with 68,840 out of 72,840 queries containing at least one identity image. Furthermore, we propose Video-Locator, the first model designed for the ITVCG task. Our proposed Video-Locator integrates video-identity-text alignment and multi-modal fine-grained fusion components, facilitating a video large language model (Video LLM) to jointly understand textual descriptions, visual identities, as well as videos. Experimental results demonstrate the effectiveness of the proposed Video-Locator model and highlight the importance of identity-generalization capability for ITVCG.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Avinash, N. Bhute, and Meshram B.B. "Text Based Approach For Indexing And Retrieval Of Image And Video: A Review." Advances in Vision Computing: An International Journal (AVC) 1, no. 1 (2014): 27–38. https://doi.org/10.5281/zenodo.3554868.

Der volle Inhalt der Quelle
Annotation:
Text data present in multimedia contain useful information for automatic annotation, indexing. Extracted information used for recognition of the overlay or scene text from a given video or image. The Extracted text can be used for retrieving the videos and images. In this paper, firstly, we are discussed the different techniques for text extraction from images and videos. Secondly, we are reviewed the techniques for indexing and retrieval of image and videos by using extracted text.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Avinash, N. Bhute, and Meshram B.B. "Text Based Approach For Indexing And Retrieval Of Image And Video: A Review." Advances in Vision Computing: An International Journal (AVC) 1, no. 1 (2014): 27–38. https://doi.org/10.5281/zenodo.3357696.

Der volle Inhalt der Quelle
Annotation:
Text data present in multimedia contain useful information for automatic annotation, indexing. Extracted information used for recognition of the overlay or scene text from a given video or image. The Extracted text can be used for retrieving the videos and images. In this paper, firstly, we are discussed the different techniques for text extraction from images and videos. Secondly, we are reviewed the techniques for indexing and retrieval of image and videos by using extracted text.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

V, Divya, Prithica G, and Savija J. "Text Summarization for Education in Vernacular Languages." International Journal for Research in Applied Science and Engineering Technology 11, no. 7 (2023): 175–78. http://dx.doi.org/10.22214/ijraset.2023.54589.

Der volle Inhalt der Quelle
Annotation:
Abstract: This project proposes a video summarizing system based on natural language processing (NLP) and Machine Learning to summarize the YouTube video transcripts without losing the key elements. The quantity of videos available on web platforms is steadily expanding. The content is made available globally, primarily for educational purposes. Additionally, educational content is available on YouTube, Facebook, Google, and Instagram. A significant issue of extracting information from videos is that unlike an image, where data can be collected from a single frame, a viewer must watch the entire video to grasp the context. This study aims to shorten the length ofthe transcript of the given video. The suggested method involves retrieving transcripts from the video link provided by the user and then summarizing the by using Hugging Face Transformers and Pipelining. The built model accepts video links and the required summary duration as input from the user and generates a summarized transcript as output. According to the results, the final translated was obtained in less time when compared with other proposed techniques. Furthermore, the video’s central concept is accurately present in the final without any deviations.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Namrata, Dave, and S. Holia Mehfuza. "News Story Retrieval Based on Textual Query." International Journal of Engineering and Advanced Technology (IJEAT) 9, no. 3 (2021): 2918–22. https://doi.org/10.5281/zenodo.5589205.

Der volle Inhalt der Quelle
Annotation:
This paper presents news video retrieval using text query for Gujarati language news videos. Due to the fact that Broadcasted Video in India is lacking in metadata information such as closed captioning, transcriptions etc., retrieval of videos based on text data is trivial task for most of the Indian language video. To retrieve specific story based on text query in regional language is the key idea behind our approach. Broadcast video is segmented to get shots representing small news stories. To represent each shot efficiently, key frame extraction using singular value decomposition and rank of matrix is proposed. Text is extracted from keyframes for further indexing data. Next task is to process text using natural language processing steps like tokenization, punctuation and extra symbols removal as well as stemming of words to root words etc. Due to unavailability of stemming and other methods of preprocessing of text in Guajarati language, we have given basic stemming technique to reduce dictionary size for efficient indexing of text data. With proposed system 82.5 percent accuracy is achieved on Gujarati news video dataset ETV.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Doran, Michael, Adrian Barnett, Joan Leach, William Lott, Katie Page, and Will Grant. "Can video improve grant review quality and lead to more reliable ranking?" Research Ideas and Outcomes 3 (February 1, 2017): e11931. https://doi.org/10.3897/rio.3.e11931.

Der volle Inhalt der Quelle
Annotation:
Multimedia video is rapidly becoming mainstream, and many studies indicate that it is a more effective communication medium than text. In this project we AIM to test if videos can be used, in place of text-based grant proposals, to improve communication and increase the reliability of grant ranking. We will test if video improves reviewer comprehension (AIM 1), if external reviewer grant scores are more consistent with video (AIM 2), and if mock Australian Research Council (ARC) panels award more consistent scores when grants are presented as videos (AIM 3). This will be the first study to evaluate the use of video in this application. The ARC reviewed over 3500 Discovery Project applications in 2015, awarding 635 Projects. Selecting the "best" projects is extremely challenging. This project will improve the selection process by facilitating the transition from text-based to video-based proposals. The impact could be profound: Improved video communication should streamline the grant preparation and review processes, enable more reliable ranking of applications, and more accurate identification of the "next big innovations".
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Jiang, Ai Wen, and Gao Rong Zeng. "Multi-information Integrated Method for Text Extraction from Videos." Advanced Materials Research 225-226 (April 2011): 827–30. http://dx.doi.org/10.4028/www.scientific.net/amr.225-226.827.

Der volle Inhalt der Quelle
Annotation:
Video text provides important semantic information in video content analysis. However, video text with complex background has a poor recognition performance for OCR. Most of the previous approaches to extracting overlay text from videos are based on traditional binarization and give little attention on multi-information integration, especially fusing the background information. This paper presents an effective method to precisely extract characters from videos to enable it for OCR with a good recognition performance. The proposed method combines multi-information together including background information, edge information, and character’s spatial information. Experimental results show that it is robust to complex background and various text appearances.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Ma, Fan, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, and Yi Yang. "Stitching Segments and Sentences towards Generalization in Video-Text Pre-training." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (2024): 4080–88. http://dx.doi.org/10.1609/aaai.v38i5.28202.

Der volle Inhalt der Quelle
Annotation:
Video-language pre-training models have recently achieved remarkable results on various multi-modal downstream tasks. However, most of these models rely on contrastive learning or masking modeling to align global features across modalities, neglecting the local associations between video frames and text tokens. This limits the model’s ability to perform fine-grained matching and generalization, especially for tasks that selecting segments in long videos based on query texts. To address this issue, we propose a novel stitching and matching pre-text task for video-language pre-training that encourages fine-grained interactions between modalities. Our task involves stitching video frames or sentences into longer sequences and predicting the positions of cross-model queries in the stitched sequences. The individual frame and sentence representations are thus aligned via the stitching and matching strategy, encouraging the fine-grained interactions between videos and texts. in the stitched sequences for the cross-modal query. We conduct extensive experiments on various benchmarks covering text-to-video retrieval, video question answering, video captioning, and moment retrieval. Our results demonstrate that the proposed method significantly improves the generalization capacity of the video-text pre-training models.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Liu, Yang, Shudong Huang, Deng Xiong, and Jiancheng Lv. "Learning Dynamic Similarity by Bidirectional Hierarchical Sliding Semantic Probe for Efficient Text Video Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 6 (2025): 5667–75. https://doi.org/10.1609/aaai.v39i6.32604.

Der volle Inhalt der Quelle
Annotation:
Text-video retrieval is a foundation task in multi-modal research which aims to align texts and videos in the embedding space. The key challenge is to learn the similarity between videos and texts. A conventional approach involves directly aligning video-text pairs using cosine similarity. However, due to the disparity in the information conveyed by videos and texts, i.e., a single video can be described from multiple perspectives, the retrieval accuracy is suboptimal. An alternative approach employs cross-modal interaction to enable videos to dynamically acquire distinct features from various texts, thus facilitating similarity calculations. Nevertheless, this solution incurs a computational complexity of O(n^2) during retrieval. To this end, this paper proposes a novel method called Bidirectional Hierarchical Sliding Semantic Probe (BiHSSP), which calculates dynamic similarity between videos and texts with O(n) complexity during retrieval. We introduce a hierarchical semantic probe module that learns semantic probes at different scales for both video and text features. Semantic probe involves a sliding calculation of the cross-correlation between semantic probes at different scales and embeddings from another modality, allowing for dynamic similarity computation between video and text descriptions from various perspectives. Specifically, for text descriptions from different angles, we calculate the similarity at different locations within the video features and vice versa. This approach preserves the complete information of the video while addressing the issue of unequal information between video and text without requiring cross-modal interaction. Additionally, our method can function as a plug-and-play module across various methods, thereby enhancing the corresponding performance. Experimental results demonstrate that our BiHSSP significantly outperforms the baseline.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Sun, Shangkun, Xiaoyu Liang, Songlin Fan, Wenxu Gao, and Wei Gao. "VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 7 (2025): 7105–13. https://doi.org/10.1609/aaai.v39i7.32763.

Der volle Inhalt der Quelle
Annotation:
Text-driven video editing has recently experienced rapid development. Despite this, evaluating edited videos remains a considerable challenge. Current metrics tend to fail to align with human perceptions, and effective quantitative metrics for video editing are still notably absent. To address this, we introduce VE-Bench, a benchmark suite tailored to the assessment of text-driven video editing. This suite includes VE-Bench DB, a video quality assessment (VQA) database for video editing. VE-Bench DB encompasses a diverse set of source videos featuring various motions and subjects, along with multiple distinct editing prompts, editing results from 8 different models, and the corresponding Mean Opinion Scores (MOS) from 24 human annotators. Based on VE-Bench DB, we further propose VE-Bench QA, a quantitative human-aligned measurement for the text-driven video editing task. In addition to the aesthetic, distortion, and other visual quality indicators that traditional VQA methods emphasize, VE-Bench QA focuses on the text-video alignment and the relevance modeling between source and edited videos. It introduces a new assessment network for video editing that attains superior performance in alignment with human preferences.To the best of our knowledge, VE-Bench introduces the first quality assessment dataset for video editing and proposes an effective subjective-aligned quantitative metric for this domain. All models, data, and code will be publicly available to the community.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Mehr Quellen

Dissertationen zum Thema "Video text"

1

Sidevåg, Emmilie. "Användarmanual text vs video." Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-17617.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Salway, Andrew. "Video annotation : the role of specialist text." Thesis, University of Surrey, 1999. http://epubs.surrey.ac.uk/843350/.

Der volle Inhalt der Quelle
Annotation:
Digital video is among the most information-intensive modes of communication. The retrieval of video from digital libraries, along with sound and text, is a major challenge for the computing community in general and for the artificial intelligence community specifically. The advent of digital video has set some old questions in a new light. Questions relating to aesthetics and to the role of surrogates - image for reality and text for image, invariably touch upon the link between vision and language. Dealing with this link computationally is important for the artificial intelligence enterprise. Interesting images to consider both aesthetically and for research in video retrieval include those which are constrained and patterned, and which convey rich meanings; for example, dance. These are specialist images for us and require a special language for description and interpretation. Furthermore, they require specialist knowledge to be understood since there is usually more than meets the untrained eye: this knowledge may also be articulated in the language of the specialism. In order to be retrieved effectively and efficiently, video has to be annotated-, particularly so for specialist moving images. Annotation involves attaching keywords from the specialism along with, for us, commentaries produced by experts, including those written and spoken specifically for annotation and those obtained from a corpus of extant texts. A system that processes such collateral text for video annotation should perhaps be grounded in an understanding of the link between vision and language. This thesis attempts to synthesise ideas from artificial intelligence, multimedia systems, linguistics, cognitive psychology and aesthetics. The link between vision and language is explored by focusing on moving images of dance and the special language used to describe and interpret them. We have developed an object-oriented system, KAB, which helps to annotate a digital video library with a collateral corpus of texts and terminology. User evaluation has been encouraging. The system is now available on the WWW.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Smith, Gregory. "VIDEO SCENE DETECTION USING CLOSED CAPTION TEXT." VCU Scholars Compass, 2009. http://scholarscompass.vcu.edu/etd/1932.

Der volle Inhalt der Quelle
Annotation:
Issues in Automatic Video Biography Editing are similar to those in Video Scene Detection and Topic Detection and Tracking (TDT). The techniques of Video Scene Detection and TDT can be applied to interviews to reduce the time necessary to edit a video biography. The system has attacked the problems of extraction of video text, story segmentation, and correlation. This thesis project was divided into three parts: extraction, scene detection, and correlation. The project successfully detected scene breaks in series television episodes and displayed scenes that had similar content.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Zhang, Jing. "Extraction of Text Objects in Image and Video Documents." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4266.

Der volle Inhalt der Quelle
Annotation:
The popularity of digital image and video is increasing rapidly. To help users navigate libraries of image and video, Content Based Information Retrieval (CBIR) system that can automatically index image and video documents are needed. However, due to the semantic gap between low-level machine descriptors and high-level semantic descriptors, the existing CBIR systems are still far from perfect. Text embedded in multi-media data, as a well-defined model of concepts for humans' communication, contains much semantic information related to the content. This text information can provide a much truer form of content-based access to the image and video documents if it can be extracted and harnessed efficiently. This dissertation solves the problem involved in detecting text object in image and video and tracking text event in video. For text detection problem, we propose a new unsupervised text detection algorithm. A new text model is constructed to describe text object using pictorial structure. Each character is a part in the model and every two neighboring characters are connected by a spring-like link. Two characters and the link connecting them are defined as a text unit. We localize candidate parts by extracting closed boundaries and initialize the links by connecting two neighboring candidate parts based on the spatial relationship of characters. For every candidate part, we compute character energy using three new character features, averaged angle difference of corresponding pairs, fraction of non-noise pairs, and vector of stroke width. They are extracted based on our observation that the edge of a character can be divided into two sets with high similarities in length, curvature, and orientation. For every candidate link, we compute link energy based on our observation that the characters of a text typically align along certain direction with similar color, size, and stroke width. For every candidate text unit, we combine character and link energies to compute text unit energy which indicates the probability that the candidate text model is a real text object. The final text detection results are generated using a text unit energy based thresholding. For text tracking problem, we construct a text event model by using pictorial structure as well. In this model, the detected text object in each video frame is a part and two neighboring text objects of a text event are connected by a spring-like link. Inter-frame link energy is computed for each link based on the character energy, similarity of neighboring text objects, and motion information. After refining the model using inter-frame link energy, the remaining text event models are marked as text events. At character level, because the proposed method is based on the assumption that the strokes of a character have uniform thickness, it can detect and localize characters from different languages in different styles, such as typewritten text or handwriting text, if the characters have approximately uniform stroke thickness. At text level, however, because the spatial relationship between two neighboring characters is used to localize text objects, the proposed method may fail to detect and localize the characters with multiple separate strokes or connected characters. For example, some East Asian language characters, such as Chinese, Japanese, and Korean, have many strokes of a single character. We need to group the strokes first to form single characters and then group characters to form text objects. While, the characters of some languages, such Arabic and Hindi, are connected together, we cannot extract spatial information between neighboring characters since they are detected as a single character. Therefore, in current stage the proposed method can detect and localize the text objects that are composed of separate characters with connected strokes with approximately uniform thickness. We evaluated our method comprehensively using three English language-based image and video datasets: ICDAR 2003/2005 text locating dataset (258 training images and 251 test images), Microsoft Street View text detection dataset (307 street view images), and VACE video dataset (50 broadcast news videos from CNN and ABC). The experimental results demonstrate that the proposed text detection method can capture the inherent properties of text and discriminate text from other objects efficiently.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Sjölund, Jonathan. "Detection of Frozen Video Subtitles Using Machine Learning." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158239.

Der volle Inhalt der Quelle
Annotation:
When subtitles are burned into a video, an error can sometimes occur in the encoder that results in the same subtitle being burned into several frames, resulting in subtitles becoming frozen. This thesis provides a way to detect frozen video subtitles with the help of an implemented text detector and classifier. Two types of classifiers, naïve classifiers and machine learning classifiers, are tested and compared on a variety of different videos to see how much a machine learning approach can improve the performance. The naïve classifiers are evaluated using ground truth data to gain an understanding of the importance of good text detection. To understand the difficulty of the problem, two different machine learning classifiers are tested, logistic regression and random forests. The result shows that machine learning improves the performance over using naïve classifiers by improving the specificity from approximately 87.3% to 95.8% and improving the accuracy from 93.3% to 95.5%. Random forests achieve the best overall performance, but the difference compared to when using logistic regression is small enough that more computationally complex machine learning classifiers are not necessary. Using the ground truth shows that the weaker naïve classifiers would be improved by at least 4.2% accuracy, thus a better text detector is warranted. This thesis shows that machine learning is a viable option for detecting frozen video subtitles.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Chen, Datong. "Text detection and recognition in images and video sequences /." [S.l.] : [s.n.], 2003. http://library.epfl.ch/theses/?display=detail&nr=2863.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Štindlová, Marie. "Museli to založit." Master's thesis, Vysoké učení technické v Brně. Fakulta výtvarných umění, 2015. http://www.nusl.cz/ntk/nusl-232451.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Bird, Paul. "Elementary students' comprehension of computer presented text." Thesis, University of British Columbia, 1990. http://hdl.handle.net/2429/29187.

Der volle Inhalt der Quelle
Annotation:
The study investigated grade 6 students' comprehension of narrative text when presented on a computer and as printed words on paper. A set of comprehension tests were developed for three stories of varying length (382 words, 1047 words and 1933 words) using a skills hierarchy protocol. The text for each story was prepared for presentation on a Macintosh computer using a program written for the study and as print in the form of exact copies of the computer screen. Students from two grade 6 classes in a suburban elementary school were randomly assigned to read one of the stories in either print form or on the computer and subsequently completed a comprehension test as well as a questionnaire concerning attitude and personal information. The responses from the comprehension tests were evaluated by graduate students in Language Education. The data evolved from the tests and questionnaires were analysed to determine measures of test construct validity, inter-rater reliability, and any significant difference in the means of comprehension scores for the two experimental groups for each story. The results indicated small but insignificant differences between the means of the three comprehension test scores for computer and print. A number of students reading from the computer complained of eye fatigue. The scores of subjects reading the longest story and complaining of eye fatigue were significantly lower.<br>Education, Faculty of<br>Curriculum and Pedagogy (EDCP), Department of<br>Graduate
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Sharma, Nabin. "Multi-lingual Text Processing from Videos." Thesis, Griffith University, 2015. http://hdl.handle.net/10072/367489.

Der volle Inhalt der Quelle
Annotation:
Advances in digital technology have produced low priced portable imaging devices such as digital cameras attached to mobile phones, camcorders, PDA’s etc. which are highly portable. These devices can be used to capture videos and images at ease, which can be shared through the internet and other communication media. In the commercial do- main, cameras are used to create news, advertisement videos and other forms of material for information communication. The use of multiple languages to create information for targeted audiences is quite common in countries having multiple official languages. Trans- mission of news, advertisement videos and images across various communication channels has created large databases of videos and these are increasing exponentially. Effective management of such databases requires proper indexing for the retrieval of relevant in- formation. Text information is dominant in most of the videos and images, which can be used as keywords for retrieval of relevant video and images. Automatic annotation of videos and images to extract keywords requires the text to be converted to an editable form. This thesis addresses the problem of multi-lingual text processing from video frames. Multi-lingual text processing involves text detection, word segmentation, script identification, and text recognition. Additionally, text frame classification is required to avoid processing a video frame which does not contain text information. A new multi-lingual video word dataset was created and published as a part of the current research. The dataset comprises words of ten scripts, namely English (Roman), Hindi (Devanagari), Bengali (Bangla), Arabic, Oriya, Gujrathi, Punjabi, Kannada, Tamil and Telugu. This dataset was created to facilitate future research on multi-lingual text recognition.<br>Thesis (PhD Doctorate)<br>Doctor of Philosophy (PhD)<br>School of Information and Communication Technology.<br>Science, Environment, Engineering and Technology<br>Full Text
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Fraz, Muhammad. "Video content analysis for intelligent forensics." Thesis, Loughborough University, 2014. https://dspace.lboro.ac.uk/2134/18065.

Der volle Inhalt der Quelle
Annotation:
The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Mehr Quellen

Bücher zum Thema "Video text"

1

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan, and Wenyin Liu. Video Text Detection. Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

author, Wilde Rod, ed. Volleyball essentials: Video-text. Total Health Publications, 2014.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Shivakumara, Palaiahnakote, and Umapada Pal. Cognitively Inspired Video Text Processing. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-7069-5.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Barry, Atkins, and Krzywinska Tanya, eds. Videogame, player, text. Manchester University Press, 2007.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Peterson, Tara. Should kids play video games?: A persuasive text. Mondo, 2006.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

H, Stark James, ed. The practice of mediation: A video-integrated text. 2nd ed. Wolters Kluwer Law & Business, 2012.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Chen, Datong. Text detection and recognition in images and video sequences. EPFL, 2003.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Szuprowicz, Bohdan O. Multimedia technology: Combining sound, text, computing, graphics, and video. Computer Technology Research Corp., 1992.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Grannell, Mike. Self-managed study in mathematics using text and video. Open Learning Foundation, 1996.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Griggs, Yvonne. Shakespeare's King Lear: The relationship between text and film. Methuen Drama, 2009.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Mehr Quellen

Buchteile zum Thema "Video text"

1

Weik, Martin H. "video text." In Computer Science and Communications Dictionary. Springer US, 2000. http://dx.doi.org/10.1007/1-4020-0613-6_20796.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan, and Wenyin Liu. "Video Preprocessing." In Video Text Detection. Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_2.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Shivakumara, Palaiahnakote, and Umapada Pal. "Video Text Recognition." In Cognitive Intelligence and Robotics. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-7069-5_9.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Shivakumara, Palaiahnakote, and Umapada Pal. "Video Text Detection." In Cognitive Intelligence and Robotics. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-7069-5_4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan, and Wenyin Liu. "Video Caption Detection." In Video Text Detection. Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_3.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan, and Wenyin Liu. "Video Text Detection Systems." In Video Text Detection. Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_7.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan, and Wenyin Liu. "Introduction to Video Text Detection." In Video Text Detection. Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_1.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan, and Wenyin Liu. "Performance Evaluation." In Video Text Detection. Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_10.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan, and Wenyin Liu. "Text Detection from Video Scenes." In Video Text Detection. Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Lu, Tong, Shivakumara Palaiahnakote, Chew Lim Tan, and Wenyin Liu. "Post-processing of Video Text Detection." In Video Text Detection. Springer London, 2014. http://dx.doi.org/10.1007/978-1-4471-6515-6_5.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Video text"

1

Lin, Xing, Langxi Liu, Pengjun Zhai, and Yu Fang. "Entity-Aware Video-Text Interaction for Contextualised Video Caption in News Video." In 2024 9th International Conference on Intelligent Computing and Signal Processing (ICSP). IEEE, 2024. http://dx.doi.org/10.1109/icsp62122.2024.10743775.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Sridhar, Bodanapu, Gourishetti Saivishnu, Varla ManiShanker, D. Dhana Lakshmi, and Shanmugasundaram Hariharan. "Summarization of Video into Text and Text to Braille Script." In 2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS). IEEE, 2024. http://dx.doi.org/10.1109/ickecs61492.2024.10617121.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Y K, Anupama, Neelam Neha, Dinky Verma, S. B. Sankeerthana, and Medha Jha. "Compilation of Text to Video models." In 2024 International Conference on IoT, Communication and Automation Technology (ICICAT). IEEE, 2024. https://doi.org/10.1109/icicat62666.2024.10923075.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Zhao, Heng, Zhao Yinjie, Bihan Wen, Yew-Soon Ong, and Joey Tianyi Zhou. "Video-Text Prompting for Weakly Supervised Spatio-Temporal Video Grounding." In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.emnlp-main.1086.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Jin, Xiaojie, Bowen Zhang, Weibo Gong, et al. "MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.02563.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Menapace, Willi, Aliaksandr Siarohin, Ivan Skorokhodov, et al. "Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.00672.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Wang, Jiamian, Pichao Wang, Guohao Sun, et al. "Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.01566.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Nixon, Lyndon, Damianos Galanopoulos, and Vasileios Mezaris. "Finding Video Shots for Immersive Journalism Through Text-to-Video Search." In 2024 International Conference on Content-Based Multimedia Indexing (CBMI). IEEE, 2024. https://doi.org/10.1109/cbmi62980.2024.10859220.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Wen, Weiwei, and Lingzhi Liao. "Video understanding with image, audio, and text." In Fourth International Conference on Advanced Algorithms and Neural Networks (AANN 2024), edited by Qinghua Lu and Weishan Zhang. SPIE, 2024. http://dx.doi.org/10.1117/12.3049519.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Zu, Xinyan, Haiyang Yu, Bin Li, and Xiangyang Xue. "Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/206.

Der volle Inhalt der Quelle
Annotation:
Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For ‘visually’, we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For ‘linguistically’, a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For ‘semantically’, we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Berichte der Organisationen zum Thema "Video text"

1

Li, Huiping, David Doermann, and Omid Kia. Automatic Text Detection and Tracking in Digital Video. Defense Technical Information Center, 1998. http://dx.doi.org/10.21236/ada458675.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Justin Olsson, Justin Olsson. Real-time underwater fish identification and biomonitoring via machine learning-based compression of video to text. Experiment, 2025. https://doi.org/10.18258/77387.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Kuzmin, Vyacheslav, Alebai Sabitov, Andrei Reutov, Vladimir Amosov, Lidiia Neupokeva, and Igor Chernikov. Electronic training manual "Providing first aid to the population". SIB-Expertise, 2024. http://dx.doi.org/10.12731/er0774.29012024.

Der volle Inhalt der Quelle
Annotation:
First aid represents the simplest urgent measures necessary to save the lives of victims of injuries, accidents and sudden illnesses. Providing first aid greatly increases the chances of salvation in case of bleeding, injury, cardiac and respiratory arrest, and prevents complications such as shock, massive blood loss, additional displacement of bone fragments and injury to large nerve trunks and blood vessels. This electronic educational resourse consists of four theoretical educational modules: legal aspects of providing first aid to victims and work safety when providing first aid; providing first aid in critical conditions of the body; providing first aid for injuries of various origins; providing first aid in case of extreme exposures, accidents and poisonings. The electronic educational resource materials include 8 emergency conditions and 11 life-saving measures. The theoretical block of modules is presented by presentations, the text of lectures with illustrations, a video film and video lectures. Control classes in the form of test control accompany each theoretical module. After studying all modules, the student passes the final test control. Mastering the electronic manual will ensure a high level of readiness to provide first aid to persons without medical education.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Sharova, Iryna. WAYS OF PROMOTING UKRANIAN PUBLISHING HOUSES ON FACEBOOK DURING QUARANTINE. Ivan Franko National University of Lviv, 2021. http://dx.doi.org/10.30970/vjo.2021.49.11076.

Der volle Inhalt der Quelle
Annotation:
The article reviews and analyzes the promotion of Ukrainian publishing houses on Facebook during quarantine in 2020. The study’s main objective is content and its types, which were used for representing on Facebook. We found out that going live and posting a text with a picture was most popular. The phenomenon of live video is tightly connected to the quarantine phenomenon. Though, not every publishing house was able to go live permanently or at least regular. However, simple text with a picture is the most uncomplicated content to post and the most popular. Ukrainian publishers also use UGC (User Generated Content), situational content, and different contexts. The biggest problem for Ukrainian publishers is continual strategic work with social media for promotion. During quarantine, social media became the first channel for communication with customers and subscribers. Therefore promotion on the Internet and in social media indeed should become equivalent to offline promotion.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Baluk, Nadia, Natalia Basij, Larysa Buk, and Olha Vovchanska. VR/AR-TECHNOLOGIES – NEW CONTENT OF THE NEW MEDIA. Ivan Franko National University of Lviv, 2021. http://dx.doi.org/10.30970/vjo.2021.49.11074.

Der volle Inhalt der Quelle
Annotation:
The article analyzes the peculiarities of the media content shaping and transformation in the convergent dimension of cross-media, taking into account the possibilities of augmented reality. With the help of the principles of objectivity, complexity and reliability in scientific research, a number of general scientific and special methods are used: method of analysis, synthesis, generalization, method of monitoring, observation, problem-thematic, typological and discursive methods. According to the form of information presentation, such types of media content as visual, audio, verbal and combined are defined and characterized. The most important in journalism is verbal content, it is the one that carries the main information load. The dynamic development of converged media leads to the dominance of image and video content; the likelihood of increasing the secondary content of the text increases. Given the market situation, the effective information product is a combined content that combines text with images, spreadsheets with video, animation with infographics, etc. Increasing number of new media are using applications and website platforms to interact with recipients. To proceed, the peculiarities of the new content of new media with the involvement of augmented reality are determined. Examples of successful interactive communication between recipients, the leading news agencies and commercial structures are provided. The conditions for effective use of VR / AR-technologies in the media content of new media, the involvement of viewers in changing stories with augmented reality are determined. The so-called immersive effect with the use of VR / AR-technologies involves complete immersion, immersion of the interested audience in the essence of the event being relayed. This interaction can be achieved through different types of VR video interactivity. One of the most important results of using VR content is the spatio-temporal and emotional immersion of viewers in the plot. The recipient turns from an external observer into an internal one; but his constant participation requires that the user preferences are taken into account. Factors such as satisfaction, positive reinforcement, empathy, and value influence the choice of VR / AR content by viewers.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Prudkov, Mikhail, Vasily Ermolaev, Elena Shurygina, and Eduard Mikaelyan. Electronic educational resource "Hospital Surgery for 5th year students of the Faculty of Pediatrics". SIB-Expertise, 2024. http://dx.doi.org/10.12731/er0780.29012024.

Der volle Inhalt der Quelle
Annotation:
Electronic educational resourc was created for independent work of 5th year students of the pediatric faculty in the study of the discipline "Hospital Surgery". The possibility of control by the teacher is provided. This EER includes an introductory module, a topic module, and a quality assessment module. The structure of each topic in the EER (there are 19 topics in total) consists of the following sections: educational and methodological tasks on the topic, abstract of the topic, control tests on the topic, clinical situational tasks on the topic and a list of references. The section "Summary of the topic" at the moment can be presented in the form of a text file, or a presentation, or a video lecture, or a monograph by the staff of the department, etc. This section is gradually updated with new materials. The section "Control tests on the topic" is designed to control the teacher for the independent work of students and contains 15 tests, the solution of which is given 10 minutes and two attempts, a passing result for crediting 71% of correct answers. The section "Clinical situational tasks" serves for self-control of the student in mastering the topic - if he understood the content of the task, made a preliminary diagnosis and knows the tactics of managing the patient, the topic is mastered. There are ten clinical situational tasks for each topic, students receive different versions of tasks. In addition, the EER has a "Final test control" section, which contains test tasks from all topics of practical classes. The program randomly generates a final test of 30 tasks, 20 minutes are allotted for solving, the student has the right to two attempts. More than 71% of correct answers are counted.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Felix, Juri, and Laura Webb. Use of artificial intelligence in education delivery and assessment. Parliamentary Office of Science and Technology, 2024. http://dx.doi.org/10.58248/pn712.

Der volle Inhalt der Quelle
Annotation:
This POSTnote considers how artificial intelligence (AI) technologies can be used by educators and learners in schools, colleges and universities. Artificial intelligence technologies that can be used in education have developed rapidly in recent years. This has been driven in part by advancements of generative AI, which is now capable of performing a wide range of tasks including the production of realistic content such as text, images, audio and video. Artificial intelligence tools have the potential to provide different ways of learning and to help educators with lesson planning, marking and other tasks. However, adoption of AI in education is still in an early and experimental phase. There is uncertainty about the benefits and limitations. Some stakeholders have expressed concerns that over-reliance on AI could diminish educator-learner relationships. Concerns also relate to potential negative impacts on learners’ writing and critical thinking skills, through work being undertaken by AI. In November 2023, the Department for Education published a report on the use of Generative AI in education. The UK Government have also announced an investment of up to £2 million to provide new AI-powered resources for teachers in England.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Krull, R. 8mm video tape test. Office of Scientific and Technical Information (OSTI), 1990. http://dx.doi.org/10.2172/6375254.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Rekstad, Gary. Development of a Video Tape to Test Video Codecs Operating at 64KBPS. Defense Technical Information Center, 1989. http://dx.doi.org/10.21236/ada228157.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Crandall, Rob. Airborne Separation Video System Government Suitability Test. Defense Technical Information Center, 1999. http://dx.doi.org/10.21236/ada368478.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie