Relevant bibliographies by topics / Visual grounding of text

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Visual grounding of text'

Author: Grafiati

Published: 25 May 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Visual grounding of text.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Visual grounding of text"

Chao Wang, Chao Wang, Wei Luo Chao Wang, Jia-Rui Zhu Wei Luo, Ying-Chun Xia Jia-Rui Zhu, Jin He Ying-Chun Xia, and Li-Chuan Gu Jin He. "End-to-end Visual Grounding Based on Query Text Guidance and Multi-stage Reasoning." 電腦學刊 35, no. 1 (February 2024): 083–95. http://dx.doi.org/10.53106/199115992024023501006.

Full text

Abstract:

<p>Visual grounding locates target objects or areas in the image based on natural language expression. Most current methods extract visual features and text embeddings independently, and then carry out complex fusion reasoning to locate target objects mentioned in the query text. However, such independently extracted visual features often contain many features that are irrelevant to the query text or misleading, thus affecting the subsequent multimodal fusion module, and deteriorating target localization. This study introduces a combined network model based on the transformer architecture, which realizes more accurate visual grounding by using query text to guide visual feature generation and multi-stage fusion reasoning. Specifically, the visual feature generation module reduces the interferences of irrelevant features and generates visual features related to query text through the guidance of query text features. The multi-stage fused reasoning module uses the relevant visual features obtained by the visual feature generation module and the query text embeddings for multi-stage interactive reasoning, further infers the correlation between the target image and the query text, so as to achieve the accurate localization of the object described by the query text. The effectiveness of the proposed model is experimentally verified on five public datasets and the model outperforms state-of-the-art methods. It achieves an improvement of 1.04%, 2.23%, 1.00% and +2.51% over the previous state-of-the-art methods in terms of the top-1 accuracy on TestA and TestB of the RefCOCO and RefCOCO+ datasets, respectively.</p> <p> </p>

APA, Harvard, Vancouver, ISO, and other styles

Regneri, Michaela, Marcus Rohrbach, Dominikus Wetzel, Stefan Thater, Bernt Schiele, and Manfred Pinkal. "Grounding Action Descriptions in Videos." Transactions of the Association for Computational Linguistics 1 (December 2013): 25–36. http://dx.doi.org/10.1162/tacl_a_00207.

Full text

Abstract:

Recent work has shown that the integration of visual information into text-based models can substantially improve model predictions, but so far only visual information extracted from static images has been used. In this paper, we consider the problem of grounding sentences describing actions in visual information extracted from videos. We present a general purpose corpus that aligns high quality videos with multiple natural language descriptions of the actions portrayed in the videos, together with an annotation of how similar the action descriptions are to each other. Experimental results demonstrate that a text-based model of similarity between actions improves substantially when combined with visual information from videos depicting the described actions.

APA, Harvard, Vancouver, ISO, and other styles

Zhan, Yang, Yuan Yuan, and Zhitong Xiong. "Mono3DVG: 3D Visual Grounding in Monocular Images." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 6988–96. http://dx.doi.org/10.1609/aaai.v38i7.28525.

Full text

Abstract:

We introduce a novel task of 3D visual grounding in monocular RGB images using language descriptions with both appearance and geometry information. Specifically, we build a large-scale dataset, Mono3DRefer, which contains 3D object targets with their corresponding geometric text descriptions, generated by ChatGPT and refined manually. To foster this task, we propose Mono3DVG-TR, an end-to-end transformer-based network, which takes advantage of both the appearance and geometry information in text embeddings for multi-modal learning and 3D object localization. Depth predictor is designed to explicitly learn geometry features. The dual text-guided adapter is proposed to refine multiscale visual and geometry features of the referred object. Based on depth-text-visual stacking attention, the decoder fuses object-level geometric cues and visual appearance into a learnable query. Comprehensive benchmarks and some insightful analyses are provided for Mono3DVG. Extensive comparisons and ablation studies show that our method significantly outperforms all baselines. The dataset and code will be released.

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Qianjun, and Jin Yuan. "Semantic-Aligned Cross-Modal Visual Grounding Network with Transformers." Applied Sciences 13, no. 9 (May 4, 2023): 5649. http://dx.doi.org/10.3390/app13095649.

Full text

Abstract:

Multi-modal deep learning methods have achieved great improvements in visual grounding; their objective is to localize text-specified objects in images. Most of the existing methods can localize and classify objects with significant appearance differences but suffer from the misclassification problem for extremely similar objects, due to inadequate exploration of multi-modal features. To address this problem, we propose a novel semantic-aligned cross-modal visual grounding network with transformers (SAC-VGNet). SAC-VGNet integrates visual and textual features with semantic alignment to highlight important feature cues for capturing tiny differences between similar objects. Technically, SAC-VGNet incorporates a multi-modal fusion module to effectively fuse visual and textual descriptions. It also introduces contrastive learning to align linguistic and visual features on the text-to-pixel level, enabling the capture of subtle differences between objects. The overall architecture is end-to-end without the need for extra parameter settings. To evaluate our approach, we manually annotate text descriptions for images in two fine-grained visual grounding datasets. The experimental results demonstrate that SAC-VGNet significantly improves performance in fine-grained visual grounding.

APA, Harvard, Vancouver, ISO, and other styles

Shen, Haozhan, Tiancheng Zhao, Mingwei Zhu, and Jianwei Yin. "GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (March 24, 2024): 4766–75. http://dx.doi.org/10.1609/aaai.v38i5.28278.

Full text

Abstract:

Visual grounding, a crucial vision-language task involving the understanding of the visual context based on the query expression, necessitates the model to capture the interactions between objects, as well as various spatial and attribute information. However, the annotation data of visual grounding task is limited due to its time-consuming and labor-intensive annotation process, resulting in the trained models being constrained from generalizing its capability to a broader domain. To address this challenge, we propose GroundVLP, a simple yet effective zero-shot method that harnesses visual grounding ability from the existing models trained from image-text pairs and pure object detection data, both of which are more conveniently obtainable and offer a broader domain compared to visual grounding annotation data. GroundVLP proposes a fusion mechanism that combines the heatmap from GradCAM and the object proposals of open-vocabulary detectors. We demonstrate that the proposed method significantly outperforms other zero-shot methods on RefCOCO/+/g datasets, surpassing prior zero-shot state-of-the-art by approximately 28% on the test split of RefCOCO and RefCOCO+. Furthermore, GroundVLP performs comparably to or even better than some non-VLP-based supervised models on the Flickr30k entities dataset. Our code is available at https://github.com/om-ai-lab/GroundVLP.

APA, Harvard, Vancouver, ISO, and other styles

Liu, Shilong, Shijia Huang, Feng Li, Hao Zhang, Yaoyuan Liang, Hang Su, Jun Zhu, and Lei Zhang. "DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (June 26, 2023): 1728–36. http://dx.doi.org/10.1609/aaai.v37i2.25261.

Full text

Abstract:

In this paper, we study the problem of visual grounding by considering both phrase extraction and grounding (PEG). In contrast to the previous phrase-known-at-test setting, PEG requires a model to extract phrases from text and locate objects from image simultaneously, which is a more practical setting in real applications. As phrase extraction can be regarded as a 1D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction. Each pair of dual queries are designed to have shared positional parts but different content parts. Such a design effectively alleviates the difficulty of modality alignment between image and text (in contrast to a single query design) and empowers Transformer decoder to leverage phrase mask-guided attention to improve the performance. To evaluate the performance of PEG, we also propose a new metric CMAP (cross-modal average precision), analogous to the AP metric in object detection. The new metric overcomes the ambiguity of Recall@1 in many-box-to-one-phrase cases in phrase grounding. As a result, our PEG pre-trained DQ-DETR establishes new state-of-the-art results on all visual grounding benchmarks with a ResNet-101 backbone. For example, it achieves 91.04% and 83.51% in terms of recall rate on RefCOCO testA and testB with a ResNet-101 backbone.

APA, Harvard, Vancouver, ISO, and other styles

Cheng, Zesen, Kehan Li, Peng Jin, Siheng Li, Xiangyang Ji, Li Yuan, Chang Liu, and Jie Chen. "Parallel Vertex Diffusion for Unified Visual Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 2 (March 24, 2024): 1326–34. http://dx.doi.org/10.1609/aaai.v38i2.27896.

Full text

Abstract:

Unified visual grounding (UVG) capitalizes on a wealth of task-related knowledge across various grounding tasks via one-shot training, which curtails retraining costs and task-specific architecture design efforts. Vertex generation-based UVG methods achieve this versatility by unified modeling object box and contour prediction and provide a text-powered interface to vast related multi-modal tasks, e.g., visual question answering and captioning. However, these methods typically generate vertexes sequentially through autoregression, which is prone to be trapped in error accumulation and heavy computation, especially for high-dimension sequence generation in complex scenarios. In this paper, we develop Parallel Vertex Diffusion (PVD) based on the parallelizability of diffusion models to accurately and efficiently generate vertexes in a parallel and scalable manner. Since the coordinates fluctuate greatly, it typically encounters slow convergence when training diffusion models without geometry constraints. Therefore, we consummate our PVD by two critical components, i.e., center anchor mechanism and angle summation loss, which serve to normalize coordinates and adopt a differentiable geometry descriptor from the point-in-polygon problem of computational geometry to constrain the overall difference of prediction and label vertexes. These innovative designs empower our PVD to demonstrate its superiority with state-of-the-art performance across various grounding tasks.

APA, Harvard, Vancouver, ISO, and other styles

Feng, Steven Y., Kevin Lu, Zhuofu Tao, Malihe Alikhani, Teruko Mitamura, Eduard Hovy, and Varun Gangal. "Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10618–26. http://dx.doi.org/10.1609/aaai.v36i10.21306.

Full text

Abstract:

We investigate the use of multimodal information contained in images as an effective method for enhancing the commonsense of Transformer models for text generation. We perform experiments using BART and T5 on concept-to-text generation, specifically the task of generative commonsense reasoning, or CommonGen. We call our approach VisCTG: Visually Grounded Concept-to-Text Generation. VisCTG involves captioning images representing appropriate everyday scenarios, and using these captions to enrich and steer the generation process. Comprehensive evaluation and analysis demonstrate that VisCTG noticeably improves model performance while successfully addressing several issues of the baseline generations, including poor commonsense, fluency, and specificity.

APA, Harvard, Vancouver, ISO, and other styles

Jia, Meihuizi, Lei Shen, Xin Shen, Lejian Liao, Meng Chen, Xiaodong He, Zhendong Chen, and Jiaqi Li. "MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (June 26, 2023): 8032–40. http://dx.doi.org/10.1609/aaai.v37i7.25971.

Full text

Abstract:

Multimodal named entity recognition (MNER) is a critical step in information extraction, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods either (1) obtain named entities with coarse-grained visual clues from attention mechanisms, or (2) first detect fine-grained visual regions with toolkits and then recognize named entities. However, they suffer from improper alignment between entity types and visual regions or error propagation in the two-stage manner, which finally imports irrelevant visual information into texts. In this paper, we propose a novel end-to-end framework named MNER-QG that can simultaneously perform MRC-based multimodal named entity recognition and query grounding. Specifically, with the assistance of queries, MNER-QG can provide prior knowledge of entity types and visual regions, and further enhance representations of both text and image. To conduct the query grounding task, we provide manual annotations and weak supervisions that are obtained via training a highly flexible visual grounding model with transfer learning. We conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MNER-QG outperforms the current state-of-the-art models on the MNER task, and also improves the query grounding performance.

APA, Harvard, Vancouver, ISO, and other styles

Shi, Zhan, Yilin Shen, Hongxia Jin, and Xiaodan Zhu. "Improving Zero-Shot Phrase Grounding via Reasoning on External Knowledge and Spatial Relations." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2253–61. http://dx.doi.org/10.1609/aaai.v36i2.20123.

Full text

Abstract:

Phrase grounding is a multi-modal problem that localizes a particular noun phrase in an image referred to by a text query. In the challenging zero-shot phrase grounding setting, the existing state-of-the-art grounding models have limited capacity in handling the unseen phrases. Humans, however, can ground novel types of objects in images with little effort, significantly benefiting from reasoning with commonsense. In this paper, we design a novel phrase grounding architecture that builds multi-modal knowledge graphs using external knowledge and then performs graph reasoning and spatial relation reasoning to localize the referred nouns phrases. We perform extensive experiments on different zero-shot grounding splits sub-sampled from the Flickr30K Entity and Visual Genome dataset, demonstrating that the proposed framework is orthogonal to backbone image encoders and outperforms the baselines by 2~3% in accuracy, resulting in a significant improvement under the standard evaluation metrics.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Visual grounding of text"

Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.

Full text

Abstract:

De nos jours l’Intelligence artificielle (IA) est omniprésente dans notre société. Le récent développement des méthodes d’apprentissage basé sur les réseaux de neurones profonds aussi appelé “Deep Learning” a permis une nette amélioration des modèles de représentation visuelle et textuelle. Cette thèse aborde la question de l’apprentissage de plongements multimodaux pour représenter conjointement des données visuelles et sémantiques. C’est une problématique centrale dans le contexte actuel de l’IA et du deep learning, qui présente notamment un très fort potentiel pour l’interprétabilité des modèles. Nous explorons dans cette thèse les espaces de représentations conjoints visuels et sémantiques. Nous proposons deux nouveaux modèles permettant de construire de tels espaces. Nous démontrons également leur capacité à localiser des concepts sémantiques dans le domaine visuel. Nous introduisons également une nouvelle méthode permettant d’apprendre une approximation différentiable des fonctions d’évaluation basée sur le rang
Nowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric

APA, Harvard, Vancouver, ISO, and other styles

Emmott, Stephen J. "The visual processing of text." Thesis, University of Stirling, 1993. http://hdl.handle.net/1893/1837.

Full text

Abstract:

The results of an investigation into the nature of the visual information obtained from pages of text and used in the visual processing of text during reading are reported. An initial investigation into the visual processing of text by applying a computational model of early vision (MIRAGE: Watt & Morgan, 1985; Watt, 1988) to pages of text (Computational Analysis 1) is shown to extract a range of features from a text image in the representation it delivers, which are organised across a range of spatial scales similar to those spanning human vision. The features the model extracts are capable of supporting a structured set of text processing tasks of the type required in reading. From the findings of this analysis, a series of psychophysical and computational studies are reported which exan-dne whether the type of information used in the human visual processing of text can be described by this modelled representation of information in text images. Using a novel technique to measure the 'visibility' of the information in text images, a second stage of investigation (Experiments 1-3) shows that information used to perform different text processing tasks of the type performed in reading is contained at different spatial scales of visual analysis. A second computational analysis of the information in text demonstrates how the spatial scale dependency of these text processing tasks can be accounted for by the model of early vision. In a third stage, two further experiments (Experiments 4-5) show how the pattern of text processing performance is determined by typographical parameters, and a third computational analysis of text demonstrates how changes in the pattern of text processing performance can be modelled by changes in the pattern of information represented by the model of vision. A fourth stage (Experiments 6-7 and Computational Analysis 4) examines the time-course of the visual processing of text. The experiments show how the duration required to reach a level of visual text processing performance varies as a function of typographical parameters, and comparison of these data with the model shows that this is consistent with a time-course of visual analysis based on a coarse-to-fine spatial scale of visual processing. A final experiment (Experiment 8) examines how reading performance varies with typographical parameters. It is shown how the pattern of reading performance and the pattern of visual text processing performance are related, and how the model of early vision might describe the visual processing of text in reading. The implications of these findings for theories of reading and theories of vision are finally discussed.

APA, Harvard, Vancouver, ISO, and other styles

Mi, Jinpeng Verfasser], and Jianwei [Akademischer Betreuer] [Zhang. "Natural Language Visual Grounding via Multimodal Learning / Jinpeng Mi ; Betreuer: Jianwei Zhang." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2020. http://d-nb.info/1205070885/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mi, Jinpeng [Verfasser], and Jianwei [Akademischer Betreuer] Zhang. "Natural Language Visual Grounding via Multimodal Learning / Jinpeng Mi ; Betreuer: Jianwei Zhang." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2020. http://d-nb.info/1205070885/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Prince, Md Enamul Hoque. "Visual text analytics for online conversations." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/61772.

Full text

Abstract:

With the proliferation of Web-based social media, asynchronous conversations have become very common for supporting online communication and collaboration. Yet the increasing volume and complexity of conversational data often make it very difficult to get insights about the discussions. This dissertation posits that by integrating natural language processing and information visualization techniques in a synergistic way, we can better support the user's task of exploring and analyzing conversations. Unlike most previous systems, which do not consider the specific characteristics of online conversations; we applied design study methodologies from the visualization literature to uncover the data and task abstractions that guided the development of a novel set of visual text analytics systems. The first of such systems is ConVis, that supports users in exploring an asynchronous conversation, such as a blog. ConVis offers a visual overview of a conversation by presenting topics, authors, and the thread structure of a conversation, as well as various interaction techniques such as brushing and linked highlighting. Broadening from a single conversation to a collection of conversations, MultiConVis combines a novel hierarchical topic modeling with multi-scale exploration techniques. A series of user studies revealed the significant improvements in user performance and subjective measures when these two systems were compared to traditional blog interfaces. Based on the lessons learned from these studies, this dissertation introduced an interactive topic modeling framework specifically for asynchronous conversations. The resulting systems empower the user in revising the underlying topic models through an intuitive set of interactive features when the current models are noisy and/or insufficient to support their information seeking tasks. Two summative studies suggested that these systems outperformed their counterparts that do not support interactive topic modeling along several subjective and objective measures. Finally, to demonstrate the generality and applicability of our approach, we tailored our previous systems to support information seeking in community question answering forums. The prototype was evaluated through a large-scale Web-based study, which suggests that our approach can be adapted to a specific conversational genre among a diverse range of users. The dissertation concludes with a critical reflection on our approach and considerations for future research.
Science, Faculty of
Computer Science, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

Chauhan, Aneesh. "Grounding human vocabulary in robot perception through interaction." Doctoral thesis, Universidade de Aveiro, 2014. http://hdl.handle.net/10773/12841.

Full text

Abstract:

Doutoramento em Engenharia Informática
This thesis addresses the problem of word learning in computational agents. The motivation behind this work lies in the need to support language-based communication between service robots and their human users, as well as grounded reasoning using symbols relevant for the assigned tasks. The research focuses on the problem of grounding human vocabulary in robotic agent’s sensori-motor perception. Words have to be grounded in bodily experiences, which emphasizes the role of appropriate embodiments. On the other hand, language is a cultural product created and acquired through social interactions. This emphasizes the role of society as a source of linguistic input. Taking these aspects into account, an experimental scenario is set up where a human instructor teaches a robotic agent the names of the objects present in a visually shared environment. The agent grounds the names of these objects in visual perception. Word learning is an open-ended problem. Therefore, the learning architecture of the agent will have to be able to acquire words and categories in an openended manner. In this work, four learning architectures were designed that can be used by robotic agents for long-term and open-ended word and category acquisition. The learning methods used in these architectures are designed for incrementally scaling-up to larger sets of words and categories. A novel experimental evaluation methodology, that takes into account the openended nature of word learning, is proposed and applied. This methodology is based on the realization that a robot’s vocabulary will be limited by its discriminatory capacity which, in turn, depends on its sensors and perceptual capabilities. An extensive set of systematic experiments, in multiple experimental settings, was carried out to thoroughly evaluate the described learning approaches. The results indicate that all approaches were able to incrementally acquire new words and categories. Although some of the approaches could not scale-up to larger vocabularies, one approach was shown to learn up to 293 categories, with potential for learning many more.
Esta tese aborda o problema da aprendizagem de palavras em agentes computacionais. A motivação por trás deste trabalho reside na necessidade de suportar a comunicação baseada em linguagem entre os robôs de serviço e os seus utilizadores humanos, bem como suportar o raciocínio baseado em símbolos que sejam relevantes no contexto das tarefas atribuídas e cujo significado seja definido com base na experiência perceptiva. Mais especificamente, o foco da investigação é o problema de estabelecer o significado das palavras na percepção do robô através da interacção homemrobô. A definição do significado das palavras com base em experiências perceptuais e perceptuo-motoras enfatiza o papel da configuração física e perceptuomotora do robô. Entretanto, a língua é um produto cultural criado e adquirido através de interacções sociais. Isso destaca o papel da sociedade como fonte linguística. Tendo em conta estes aspectos, um cenário experimental foi definido no qual um instrutor humano ensina a um agente robótico os nomes dos objectos presentes num ambiente visualmente partilhado. O agente associa os nomes desses objectos à sua percepção visual desses objectos. A aprendizagem de palavras é um problema sem objectivo pré-estabelecido. Nós adquirimos novas palavras ao longo das nossas vidas. Assim, a arquitectura de aprendizagem do agente deve poder adquirir palavras e categorias de uma forma semelhante. Neste trabalho foram concebidas quatro arquitecturas de aprendizagem que podem ser usadas por agentes robóticos para aprendizagem e aquisição de novas palavras e categorias, incrementalmente. Os métodos de aprendizagem utilizados nestas arquitecturas foram projectados para funcionar de forma incremental, acumulando um conjunto cada vez maior de palavras e categorias. É proposta e aplicada uma nova metodologia da avaliação experimental que leva em conta a natureza aberta e incremental da aprendizagem de palavras. Esta metodologia leva em consideração a constatação de que o vocabulário de um robô será limitado pela sua capacidade de discriminação, a qual, por sua vez, depende dos seus sensores e capacidades perceptuais. Foi realizado um extenso conjunto de experiências sistemáticas em múltiplas situações experimentais, para avaliar cuidadosamente estas abordagens de aprendizagem. Os resultados indicam que todas as abordagens foram capazes de adquirir novas palavras e categorias incrementalmente. Embora em algumas das abordagens não tenha sido possível atingir vocabulários maiores, verificou-se que uma das abordagens conseguiu aprender até 293 categorias, com potencial para aprender muitas mais.

APA, Harvard, Vancouver, ISO, and other styles

Sabir, Ahmed. "Enhancing scene text recognition with visual context information." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/670286.

Full text

Abstract:

This thesis addresses the problem of improving text spotting systems, which aim to detect and recognize text in unrestricted images (e.g. a street sign, an advertisement, a bus destination, etc.). The goal is to improve the performance of off-the-shelf vision systems by exploiting the semantic information derived from the image itself. The rationale is that knowing the content of the image or the visual context can help to decide which words are the correct andidate words. For example, the fact that an image shows a coffee shop makes it more likely that a word on a signboard reads as Dunkin and not unkind. We address this problem by drawing on successful developments in natural language processing and machine learning, in particular, learning to re-rank and neural networks, to present post-process frameworks that improve state-of-the-art text spotting systems without the need for costly data-driven re-training or tuning procedures. Discovering the degree of semantic relatedness of candidate words and their image context is a task related to assessing the semantic similarity between words or text fragments. However, semantic relatedness is more general than similarity (e.g. car, road, and traffic light are related but not similar) and requires certain adaptations. To meet the requirements of these broader perspectives of semantic similarity, we develop two approaches to learn the semantic related-ness of the spotted word and its environmental context: word-to-word (object) or word-to-sentence (caption). In the word-to-word approach, word embed-ding based re-rankers are developed. The re-ranker takes the words from the text spotting baseline and re-ranks them based on the visual context from the object classifier. For the second, an end-to-end neural approach is designed to drive image description (caption) at the sentence-level as well as the word-level (objects) and re-rank them based not only on the visual context but also on the co-occurrence between them. As an additional contribution, to meet the requirements of data-driven ap-proaches such as neural networks, we propose a visual context dataset for this task, in which the publicly available COCO-text dataset [Veit et al. 2016] has been extended with information about the scene (including the objects and places appearing in the image) to enable researchers to include the semantic relations between texts and scene in their Text Spotting systems, and to offer a common evaluation baseline for such approaches.
Aquesta tesi aborda el problema de millorar els sistemes de reconeixement de text, que permeten detectar i reconèixer text en imatges no restringides (per exemple, un cartell al carrer, un anunci, una destinació d’autobús, etc.). L’objectiu és millorar el rendiment dels sistemes de visió existents explotant la informació semàntica derivada de la pròpia imatge. La idea principal és que conèixer el contingut de la imatge o el context visual en el que un text apareix, pot ajudar a decidir quines són les paraules correctes. Per exemple, el fet que una imatge mostri una cafeteria fa que sigui més probable que una paraula en un rètol es llegeixi com a Dunkin que no pas com unkind. Abordem aquest problema recorrent a avenços en el processament del llenguatge natural i l’aprenentatge automàtic, en particular, aprenent re-rankers i xarxes neuronals, per presentar solucions de postprocés que milloren els sistemes de l’estat de l’art de reconeixement de text, sense necessitat de costosos procediments de reentrenament o afinació que requereixin grans quantitats de dades. Descobrir el grau de relació semàntica entre les paraules candidates i el seu context d’imatge és una tasca relacionada amb l’avaluació de la semblança semàntica entre paraules o fragments de text. Tanmateix, determinar l’existència d’una relació semàntica és una tasca més general que avaluar la semblança (per exemple, cotxe, carretera i semàfor estan relacionats però no són similars) i per tant els mètodes existents requereixen certes adaptacions. Per satisfer els requisits d’aquestes perspectives més àmplies de relació semàntica, desenvolupem dos enfocaments per aprendre la relació semàntica de la paraula reconeguda i el seu context: paraula-a-paraula (amb els objectes a la imatge) o paraula-a-frase (subtítol de la imatge). En l’enfocament de paraula-a-paraula s’usen re-rankers basats en word-embeddings. El re-ranker pren les paraules proposades pel sistema base i les torna a reordenar en funció del context visual proporcionat pel classificador d’objectes. Per al segon cas, s’ha dissenyat un enfocament neuronal d’extrem a extrem per explotar la descripció de la imatge (subtítol) tant a nivell de frase com a nivell de paraula i re-ordenar les paraules candidates basant-se tant en el context visual com en les co-ocurrències amb el subtítol. Com a contribució addicional, per satisfer els requisits dels enfocs basats en dades com ara les xarxes neuronals, presentem un conjunt de dades de contextos visuals per a aquesta tasca, en el què el conjunt de dades COCO-text disponible públicament [Veit et al. 2016] s’ha ampliat amb informació sobre l’escena (inclosos els objectes i els llocs que apareixen a la imatge) per permetre als investigadors incloure les relacions semàntiques entre textos i escena als seus sistemes de reconeixement de text, i oferir una base d’avaluació comuna per a aquests enfocaments.

APA, Harvard, Vancouver, ISO, and other styles

Willems, Heather Marie. "Writing the written: text as a visual image." The Ohio State University, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=osu1382952227.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kan, Jichao. "Visual-Text Translation with Deep Graph Neural Networks." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/23759.

Full text

Abstract:

Visual-text translation is to produce textual descriptions in natural languages from images and videos. In this thesis, we investigate two topics in the field: image captioning and continuous sign language recognition, by exploring structural representations of visual content. Image captioning is to generate text descriptions for a given image. Deep learning based methods have achieved impressive performance on this topic. However, the relations among objects in an image have not been fully explored. Thus, a topic-guided local-global graph neural network is proposed to extract graph properties at both local and global levels. The local features are built with visual objects, while the global features are characterized with topics, both modelled with two individual graphs. Experimental results on the MS-COCO dataset showed that our proposed method outperforms several state-of-the-art image captioning methods. Continuous Sign language recognition (SLR) takes video clips of a sign language sentence as input while producing a sentence as output in a natural language, which can be regarded as a machine translation problem. However, SLR is different from general machine translation problem because of the unique features of the input, e.g., facial expression and relationship among body parts. The facial and hand features can be extracted with neural networks while the interaction between body parts has not yet fully exploited. Therefore, a hierarchical spatio-temporal graph neural network is proposed, which takes both appearance and motion features into account and models the relationship between body parts with a hierarchical graph convolution network. Experimental results on two widely used datasets, PHOENIX-2014-T and Chinese Sign Language, show the effectiveness of our proposed method. In summary, our studies demonstrate structural representations with graph neural networks are helpful for improving the translation performance from visual content to text descriptions.

APA, Harvard, Vancouver, ISO, and other styles

Shmueli, Yael. "Integrating speech and visual text in multimodal interfaces." Thesis, University College London (University of London), 2005. http://discovery.ucl.ac.uk/1446688/.

Full text

Abstract:

This work systematically investigates when and how combining speech output and visual text may facilitate processing and comprehension of sentences. It is proposed that a redundant multimodal presentation of speech and text has the potential for improving sentence processing but also for severely disrupting it. The effectiveness of the presentation is assumed to depend on the linguistic complexity of the sentence, the memory demands incurred by the selected multimodal configuration and the characteristics of the user. The thesis employs both theoretical and empirical methods to examine this claim. At the theoretical front, the research makes explicit features of multimodal sentence presentation and of structures and processes involved in multimodal language processing. Two entities are presented: a multimodal design space (MMDS) and a multimodal user model (MMUM). The dimensions of the MMDS include aspects of (i) the sentence (linguistic complexity, c.f., Gibson, 1991), (ii) the presentation (configurations of media), and (iii) user cost (a function of the first two dimensions). The second entity, the MMUM, is a cognitive model of the user. The MMUM attempts to characterise the cognitive structures and processes underlying multimodal language processing, including the supervisory attentional mechanisms that coordinate the processing of language in parallel modalities. The model includes an account of individual differences in verbal working memory (WM) capacity (c.f. Just and Carpenter, 1992) and can predict the variation in the cognitive cost experienced by the user when presented with different contents in a variety of multimodal configurations. The work attempts to validate through 3 controlled studies with users the central propositions of the MMUM. The experimental findings indicate the validity of some features of the MMUM but also the need for further refinement. Overall, they suggest that a durable text may reduce the processing cost of demanding sentences delivered by speech, whereas adding speech to such sentences when presented visually increases processing cost. Speech can be added to various visual forms of text only if the linguistic complexity of the sentence imposes a low to moderate load on the user. These conclusions are translated to a set of guidelines for effective multimodal presentation of sentences. A final study then examines the validity of some of these guidelines in an applied setting. Results highlight the need for an enhanced experimental control. However, they also demonstrate that the approach used in this research can validate specific assumptions regarding the relationship between cognitive cost, sentence complexity and multimodal configuration aspects and thereby to inform the design process of effective multimodal user interfaces.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Visual grounding of text"

Jessica, Wyman, ed. Pro forma: Language, text, visual art. Toronto, ON, Canada: YYZ Books, 2005.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Strassner, Erich. Text-Bild-Kommunikation - Bild-Text-Kommunikation. Tübingen: Niemeyer, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Wolfgang, Harms, and Deutsche Forschungsgemeinschaft, eds. Text und Bild, Bild und Text: DFG-Symposion 1988. Stuttgart: J.B. Metzler, 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Text und Bild: Grundfragen der Beschreibung von Text-Bild-Kommunikationen aus sprachwissenschaftlicher Sicht. Tübingen: Narr, 1986.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Leidner, Jochen L. Toponym resolution in text: Annotation, evaluation and applications of spatial grounding of place names. Boca Raton: Dissertation.com, 2007.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

K, Ranai, ed. Visual editing on unix. Singapore: World Scientific, 1989.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

1948-, John Samuel G., and Institute of Asian Studies (Madras, India), eds. The Great penance at Māmallapuram: Deciphering a visual text. Chennai: Institute of Asian Studies, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

The Bible as visual culture: When text becomes image. Sheffield: Sheffield Phoenix Press, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

V, Drake Michael, ed. The visual fields: Text and atlas of clinical perimetry. 6th ed. St. Louis: Mosby, 1990.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Gail, Finney, ed. Visual culture in twentieth-century Germany: Text as spectacle. Bloomington, Ind: Indiana University Press, 2006.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Visual grounding of text"

Min, Seonwoo, Nokyung Park, Siwon Kim, Seunghyun Park, and Jinkyu Kim. "Grounding Visual Representations with Texts for Domain Generalization." In Lecture Notes in Computer Science, 37–53. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19836-6_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hong, Tao, Ya Wang, Xingwu Sun, Xiaoqing Li, and Jinwen Ma. "CMMix: Cross-Modal Mix Augmentation Between Images and Texts for Visual Grounding." In Communications in Computer and Information Science, 471–82. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8148-9_37.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hendricks, Lisa Anne, Ronghang Hu, Trevor Darrell, and Zeynep Akata. "Grounding Visual Explanations." In Computer Vision – ECCV 2018, 269–86. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01216-8_17.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Johari, Kritika, Christopher Tay Zi Tong, Vigneshwaran Subbaraju, Jung-Jae Kim, and U.-Xuan Tan. "Gaze Assisted Visual Grounding." In Social Robotics, 191–202. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-90525-5_17.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Xiao, Junbin, Xindi Shang, Xun Yang, Sheng Tang, and Tat-Seng Chua. "Visual Relation Grounding in Videos." In Computer Vision – ECCV 2020, 447–64. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58539-6_27.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Goy, Anna. "Grounding Meaning in Visual Knowledge." In Spatial Language, 121–45. Dordrecht: Springer Netherlands, 2002. http://dx.doi.org/10.1007/978-94-015-9928-3_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Silberer, Carina. "Grounding the Meaning of Words with Visual Attributes." In Visual Attributes, 331–62. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-50077-5_13.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mazaheri, Amir, and Mubarak Shah. "Visual Text Correction." In Computer Vision – ECCV 2018, 159–75. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01261-8_10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wainer, Howard. "Integrating Figures and Text." In Visual Revelations, 143–45. New York, NY: Springer New York, 1997. http://dx.doi.org/10.1007/978-1-4612-2282-8_18.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kittler, Josef, Mikhail Shevchenko, and David Windridge. "Visual Bootstrapping for Unsupervised Symbol Grounding." In Advanced Concepts for Intelligent Vision Systems, 1037–46. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11864349_94.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Visual grounding of text"

Zhang, Yimeng, Xin Chen, Jinghan Jia, Sijia Liu, and Ke Ding. "Text-Visual Prompting for Efficient 2D Temporal Video Grounding." In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023. http://dx.doi.org/10.1109/cvpr52729.2023.01421.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wu, Yanmin, Xinhua Cheng, Renrui Zhang, Zesen Cheng, and Jian Zhang. "EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding." In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023. http://dx.doi.org/10.1109/cvpr52729.2023.01843.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Endo, Ko, Masaki Aono, Eric Nichols, and Kotaro Funakoshi. "An Attention-based Regression Model for Grounding Textual Phrases in Images." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/558.

Full text

Abstract:

Grounding, or localizing, a textual phrase in an image is a challenging problem that is integral to visual language understanding. Previous approaches to this task typically make use of candidate region proposals, where end performance depends on that of the region proposal method and additional computational costs are incurred. In this paper, we treat grounding as a regression problem and propose a method to directly identify the region referred to by a textual phrase, eliminating the need for external candidate region prediction. Our approach uses deep neural networks to combine image and text representations and refines the target region with attention models over both image subregions and words in the textual phrase. Despite the challenging nature of this task and sparsity of available data, in evaluation on the ReferIt dataset, our proposed method achieves a new state-of-the-art in performance of 37.26% accuracy, surpassing the previously reported best by over 5 percentage points. We find that combining image and text attention models and an image attention area-sensitive loss function contribute to substantial improvements.

APA, Harvard, Vancouver, ISO, and other styles

Conser, Erik, Kennedy Hahn, Chandler Watson, and Melanie Mitchell. "Revisiting Visual Grounding." In Proceedings of the Second Workshop on Shortcomings in Vision and Language. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/w19-1804.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kim, Yongmin, Chenhui Chu, and Sadao Kurohashi. "Flexible Visual Grounding." In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.acl-srw.22.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Du, Ye, Zehua Fu, Qingjie Liu, and Yunhong Wang. "Visual Grounding with Transformers." In 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022. http://dx.doi.org/10.1109/icme52920.2022.9859880.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Jing, Chenchen, Yuwei Wu, Mingtao Pei, Yao Hu, Yunde Jia, and Qi Wu. "Visual-Semantic Graph Matching for Visual Grounding." In MM '20: The 28th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3394171.3413902.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Deng, Chaorui, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, and Mingkui Tan. "Visual Grounding via Accumulated Attention." In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018. http://dx.doi.org/10.1109/cvpr.2018.00808.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lee, Jason, Kyunghyun Cho, and Douwe Kiela. "Countering Language Drift via Visual Grounding." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/d19-1447.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sun, Yuxi, Shanshan Feng, Xutao Li, Yunming Ye, Jian Kang, and Xu Huang. "Visual Grounding in Remote Sensing Images." In MM '22: The 30th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503161.3548316.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Visual grounding of text"

Steed, Chad A., Christopher T. Symons, James K. Senter, and Frank A. DeNap. Guided Text Search Using Adaptive Visual Analytics. Office of Scientific and Technical Information (OSTI), October 2012. http://dx.doi.org/10.2172/1055105.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Beiker, Sven, ed. Unsettled Issues Regarding Visual Communication Between Automated Vehicles and Other Road Users. SAE International, July 2021. http://dx.doi.org/10.4271/epr2021016.

Full text

Abstract:

As automated road vehicles begin their deployment into public traffic, and they will need to interact with human driven vehicles, pedestrians, bicyclists, etc. This requires some form of communication between those automated vehicles (AVs) and other road users. Some of these communication modes (e.g., auditory, motion) were discussed in “Unsettled Issues Regarding Communication of Automated Vehicles with Other Road Users.” Unsettled Issues Regarding Visual Communication Between Automated Vehicles and Other Road Users focuses on sisual communication and its balance of reach, clarity, and intuitiveness. This report discusses the different modes of visual communication (such a simple lights and rich text) and how they can be used for communication between AVs and other road users. A particular emphasis is put on standardization to highlight how uniformity and mass adoption increases efficacy of communications means.

APA, Harvard, Vancouver, ISO, and other styles

Дирда, І. А., and З. П. Бакум. Linguodidactic fundamentals of the development of foreign students’ polycultural competence during the Ukrainian language training. Association 1901 "SEPIKE", 2016. http://dx.doi.org/10.31812/123456789/2994.

Full text

Abstract:

The paper shows the analysis of scientists’ views to the definitions of terms “approaches to studying”,”principles”, “methods” and “techniques”. The development of foreign students’ polycultural competence is realized in particular approaches (competence, activity approach, personal oriented, polycultural approach); principles (communicative principle, principles of humanism, scientific nature, visual methods, systematicness and succession, consciousness, continuity and availability, individualization, text centrism, native language consideration, connection between theory and practice); usage of some methods (method of reading, direct, purposeful and comparative, purposeful and practical, communicative methods) and techniques (explanation of the teacher, usage of speech clichés, sound and letter analysis, etc.).

APA, Harvard, Vancouver, ISO, and other styles

Бакум, З. П., and І. А. Дирда. Linguodidactic Fundamentals of the Development of Foreign Students' Polycultural Competence During the Ukrainian Language Training. Криворізький державний педагогічний університет, 2016. http://dx.doi.org/10.31812/0564/398.

Full text

Abstract:

The paper shows the analysis of scientists' views to the definitions of terms "approaches to studying", "principles", "methods" and "techniques". The development of foreign students' polycultural competence is realized in particular approaches (competence, activity approach, personal oriented, polycultural approach); principles (communicative principle, principles of humanism, scientific nature, visual methods, systematicness and succession, consciousness, continuity and availability, individualization, text centrism, native language consideration, connection between theory and practice); usage of some methods (method of reading, direct, purposeful and comparative, purposeful and practical, communicative methods) and techniques (explanation of the teacher, usage of speech clichés, sound and letter analysis, etc).

APA, Harvard, Vancouver, ISO, and other styles

Figueredo, Luisa, Liliana Martinez, and Joao Paulo Almeida. Current role of Endoscopic Endonasal Approach for Craniopharyngiomas. A 10-year Systematic review and Meta-Analysis Comparison with the Open Transcranial Approach. INPLASY - International Platform of Registered Systematic Review and Meta-analysis Protocols, January 2023. http://dx.doi.org/10.37766/inplasy2023.1.0045.

Full text

Abstract:

Review question / Objective: To identify and review studies published in the last ten years, presenting the efficacy and outcomes of EEA and TCA for patients with cranio-pharyngiomas. Eligibility criteria: Studies meeting the following criteria were included: (a) retrospective and prospective studies and (b) observational studies (i.e., cross-sectional, case-control, case-series). The outcomes included visual outcomes (improvement, no changes, worsening), endocrinological outcomes (permanent diabetes insipidus and hypopituitarism), operatory site infection, meningitis, cerebrospinal fluid leak, stroke, hemorrhage, and mortality. Studies were excluded if they were determined to be: (a) case-report studies, (b) studies testing genetic disorders, (c) poster presentation abstracts without full-text availability, (d) systematic reviews, and (e) metanalyses.

APA, Harvard, Vancouver, ISO, and other styles

Yatsymirska, Mariya. Мова війни і «контрнаступальна» лексика у стислих медійних текстах. Ivan Franko National University of Lviv, March 2023. http://dx.doi.org/10.30970/vjo.2023.52-53.11742.

Full text

Abstract:

The article examines the language of the russian-ukrainian war of the 21st century based on the materials of compressed media texts; the role of political narratives and psychological-emotional markers in the creation of new lexemes is clarified; the verbal expression of forecasts of ukrainian and foreign analysts regarding the course of hostilities on the territory of Ukraine is shown. Compressed media texts reflect the main meanings of the language of the russian-ukrainian war in relation to the surrounding world. First of all, the media vocabulary was supplemented with neologisms – aggressive and sad: “rashism”, “denazification”, “katsapstan”, “orks”, “rusnia”, “kremlins”, “parebrik”, “in the swamps”, “nuclear dictator”, “putinism”, “two hundred” and others. Numerals acquired new expressive and evaluative meanings: “200s” (dead), “300s” (wounded), “400s” (russian military personnel who filed reports for termination of the contract), “500s” (hopelessly drunk russian soldiers, alcoholics who are unable to perform combat tasks). The language of war intensified the slogans of the struggle for state independence and people’s freedom. The scope of the greeting “Glory to Ukraine! – Glory to Heroes!”. New official holidays have appeared in the history of Ukraine since 2014: “Heroes of the Heavenly Hundred” Day (February 20), “Ukrainian Volunteer Day” (March 14), “Defenders and Defenders of Ukraine Day” (October 14), “Volunteer Day” (5 December). As you know, the professional holiday of the military is the Day of the Armed Forces of Ukraine” (December 6). A special style is characteristic of media texts on military topics: “Iron Force of Ukraine” (Iron Force of Ukraine), “digitize the Army” (for effective simulation of military operations); “grain corridor” (export of Ukrainian grain to African and European countries); “don’t let Ukraine lose” (the position of the Allies at the first stage of the war), “Ukraine must win!” (the position of the Allies in the second stage of the war); “in the Russian-Ukrainian war, the thinking of the 19th century collided with the thinking of the 21st century”, “a politician is a person who understands time” (Grigori Yavlinskyy, Russian oppositionist); “aggressive neutrality” (about Turkey’s position); “in Russia”, “there, in the swamps” (in Russia), “weak, inadequate evil” (about Russia), “behind the fence”; “a great reset of the world order”; “technology of military creativity”; “they are not Russian and not Ukrainian, they are Soviet”, “people without mentality”, “in Ukraine and without Ukraine” (Vitaly Portnikov about a separate category of Russian-speaking citizens in Ukraine); “information bed of Ukraine” (about combat operations on the front line; “when a descendant asks me what I did in those terrifying moments, I will know what to answer. At the very least, I did not stand aside” (opinion of a Ukrainian fighter). Compressed in media texts is implemented in the headline, note, infographic, chronicle, digest, help, caption for photos, blitz poll, interview, short articles, caricature, visual text, commercial, etc. Researchers add “nominative-representative text (business card text, titles of sections, pages, names of presenters, etc.) to concise media texts for a functional and pragmatic purpose.” accent text (quote, key idea); text-navigator (content, news feed, indication of movement or time); chronotope”. A specific linguistic phenomenon known as “language compression” is widespread in media texts. Language compression is the art of minimization; attention is focused on the main, the most essential, everything secondary is filtered out. Compression uses words succinctly and sparingly to convey the meaning as much as possible. For example, the headline “Racism. What is the essence of the new ideology of the Russian occupiers?”. The note briefly explains the meaning of this concept and explains the difference from “nazism” and “fascism”. Key words: compressed media text, language compression, language of war, emotional markers, expressive neologisms, political journalism.

APA, Harvard, Vancouver, ISO, and other styles

Baluk, Nadia, Natalia Basij, Larysa Buk, and Olha Vovchanska. VR/AR-TECHNOLOGIES – NEW CONTENT OF THE NEW MEDIA. Ivan Franko National University of Lviv, February 2021. http://dx.doi.org/10.30970/vjo.2021.49.11074.

Full text

Abstract:

The article analyzes the peculiarities of the media content shaping and transformation in the convergent dimension of cross-media, taking into account the possibilities of augmented reality. With the help of the principles of objectivity, complexity and reliability in scientific research, a number of general scientific and special methods are used: method of analysis, synthesis, generalization, method of monitoring, observation, problem-thematic, typological and discursive methods. According to the form of information presentation, such types of media content as visual, audio, verbal and combined are defined and characterized. The most important in journalism is verbal content, it is the one that carries the main information load. The dynamic development of converged media leads to the dominance of image and video content; the likelihood of increasing the secondary content of the text increases. Given the market situation, the effective information product is a combined content that combines text with images, spreadsheets with video, animation with infographics, etc. Increasing number of new media are using applications and website platforms to interact with recipients. To proceed, the peculiarities of the new content of new media with the involvement of augmented reality are determined. Examples of successful interactive communication between recipients, the leading news agencies and commercial structures are provided. The conditions for effective use of VR / AR-technologies in the media content of new media, the involvement of viewers in changing stories with augmented reality are determined. The so-called immersive effect with the use of VR / AR-technologies involves complete immersion, immersion of the interested audience in the essence of the event being relayed. This interaction can be achieved through different types of VR video interactivity. One of the most important results of using VR content is the spatio-temporal and emotional immersion of viewers in the plot. The recipient turns from an external observer into an internal one; but his constant participation requires that the user preferences are taken into account. Factors such as satisfaction, positive reinforcement, empathy, and value influence the choice of VR / AR content by viewers.

APA, Harvard, Vancouver, ISO, and other styles

Makhachashvili, Rusudan K., Svetlana I. Kovpik, Anna O. Bakhtina, and Ekaterina O. Shmeltser. Technology of presentation of literature on the Emoji Maker platform: pedagogical function of graphic mimesis. [б. в.], July 2020. http://dx.doi.org/10.31812/123456789/3864.

Full text

Abstract:

The article deals with the technology of visualizing fictional text (poetry) with the help of emoji symbols in the Emoji Maker platform that not only activates students’ thinking, but also develops creative attention, makes it possible to reproduce the meaning of poetry in a succinct way. The application of this technology has yielded the significance of introducing a computer being emoji in the study and mastering of literature is absolutely logical: an emoji, phenomenologically, logically and eidologically installed in the digital continuum, is separated from the natural language provided by (ethno)logy, and is implicitly embedded into (cosmo)logy. The technology application object is the text of the twentieth century Cuban poet José Ángel Buesa. The choice of poetry was dictated by the appeal to the most important function of emoji – the expression of feelings, emotions, and mood. It has been discovered that sensuality can reconstructed with the help of this type of meta-linguistic digital continuum. It is noted that during the emoji design in the Emoji Maker program, due to the technical limitations of the platform, it is possible to phenomenologize one’s own essential-empirical reconstruction of the lyrical image. Creating the image of the lyrical protagonist sign, it was sensible to apply knowledge in linguistics, philosophy of language, psychology, psycholinguistics, literary criticism. By constructing the sign, a special emphasis was placed on the facial emogram, which also plays an essential role in the transmission of a wide range of emotions, moods, feelings of the lyrical protagonist. Consequently, the Emoji Maker digital platform allowed to create a new model of digital presentation of fiction, especially considering the psychophysiological characteristics of the lyrical protagonist. Thus, the interpreting reader, using a specific digital toolkit – a visual iconic sign (smile) – reproduces the polylaterial metalinguistic multimodality of the sign meaning in fiction. The effectiveness of this approach is verified by the poly-functional emoji ousia, tested on texts of fiction.

APA, Harvard, Vancouver, ISO, and other styles

Yatsymirska, Mariya. SOCIAL EXPRESSION IN MULTIMEDIA TEXTS. Ivan Franko National University of Lviv, February 2021. http://dx.doi.org/10.30970/vjo.2021.49.11072.

Full text

Abstract:

The article investigates functional techniques of extralinguistic expression in multimedia texts; the effectiveness of figurative expressions as a reaction to modern events in Ukraine and their influence on the formation of public opinion is shown. Publications of journalists, broadcasts of media resonators, experts, public figures, politicians, readers are analyzed. The language of the media plays a key role in shaping the worldview of the young political elite in the first place. The essence of each statement is a focused thought that reacts to events in the world or in one’s own country. The most popular platform for mass information and social interaction is, first of all, network journalism, which is characterized by mobility and unlimited time and space. Authors have complete freedom to express their views in direct language, including their own word formation. Phonetic, lexical, phraseological and stylistic means of speech create expression of the text. A figurative word, a good aphorism or proverb, a paraphrased expression, etc. enhance the effectiveness of a multimedia text. This is especially important for headlines that simultaneously inform and influence the views of millions of readers. Given the wide range of issues raised by the Internet as a medium, research in this area is interdisciplinary. The science of information, combining language and social communication, is at the forefront of global interactions. The Internet is an effective source of knowledge and a forum for free thought. Nonlinear texts (hypertexts) – «branching texts or texts that perform actions on request», multimedia texts change the principles of information collection, storage and dissemination, involving billions of readers in the discussion of global issues. Mastering the word is not an easy task if the author of the publication is not well-read, is not deep in the topic, does not know the psychology of the audience for which he writes. Therefore, the study of media broadcasting is an important component of the professional training of future journalists. The functions of the language of the media require the authors to make the right statements and convincing arguments in the text. Journalism education is not only knowledge of imperative and dispositive norms, but also apodictic ones. In practice, this means that there are rules in media creativity that are based on logical necessity. Apodicticity is the first sign of impressive language on the platform of print or electronic media. Social expression is a combination of creative abilities and linguistic competencies that a journalist realizes in his activity. Creative self-expression is realized in a set of many important factors in the media: the choice of topic, convincing arguments, logical presentation of ideas and deep philological education. Linguistic art, in contrast to painting, music, sculpture, accumulates all visual, auditory, tactile and empathic sensations in a universal sign – the word. The choice of the word for the reproduction of sensory and semantic meanings, its competent use in the appropriate context distinguishes the journalist-intellectual from other participants in forums, round tables, analytical or entertainment programs. Expressive speech in the media is a product of the intellect (ability to think) of all those who write on socio-political or economic topics. In the same plane with him – intelligence (awareness, prudence), the first sign of which (according to Ivan Ogienko) is a good knowledge of the language. Intellectual language is an important means of organizing a journalistic text. It, on the one hand, logically conveys the author’s thoughts, and on the other – encourages the reader to reflect and comprehend what is read. The richness of language is accumulated through continuous self-education and interesting communication. Studies of social expression as an important factor influencing the formation of public consciousness should open up new facets of rational and emotional media broadcasting; to trace physical and psychological reactions to communicative mimicry in the media. Speech mimicry as one of the methods of disguise is increasingly becoming a dangerous factor in manipulating the media. Mimicry is an unprincipled adaptation to the surrounding social conditions; one of the most famous examples of an animal characterized by mimicry (change of protective color and shape) is a chameleon. In a figurative sense, chameleons are called adaptive journalists. Observations show that mimicry in politics is to some extent a kind of game that, like every game, is always conditional and artificial.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Visual grounding of text'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Visual grounding of text"

Dissertations / Theses on the topic "Visual grounding of text"

Books on the topic "Visual grounding of text"

Book chapters on the topic "Visual grounding of text"

Conference papers on the topic "Visual grounding of text"

Reports on the topic "Visual grounding of text"