Academic literature on the topic 'Visual question generation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Visual question generation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Visual question generation"

1

Patil, Charulata, and Manasi Patwardhan. "Visual Question Generation." ACM Computing Surveys 53, no. 3 (July 5, 2020): 1–22. http://dx.doi.org/10.1145/3383465.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Liu, Hongfei, Jiali Chen, Wenhao Fang, Jiayuan Xie, and Yi Cai. "Category-Guided Visual Question Generation (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (June 26, 2023): 16262–63. http://dx.doi.org/10.1609/aaai.v37i13.26991.

Full text
Abstract:
Visual question generation aims to generate high-quality questions related to images. Generating questions based only on images can better reduce labor costs and thus be easily applied. However, their methods tend to generate similar general questions that fail to ask questions about the specific content of each image scene. In this paper, we propose a category-guided visual question generation model that can generate questions with multiple categories that focus on different objects in an image. Specifically, our model first selects the appropriate question category based on the objects in the image and the relationships among objects. Then, we generate corresponding questions based on the selected question categories. Experiments conducted on the TDIUC dataset show that our proposed model outperforms existing models in terms of diversity and quality.
APA, Harvard, Vancouver, ISO, and other styles
3

Mi, Li, Syrielle Montariol, Javiera Castillo Navarro, Xianjie Dai, Antoine Bosselut, and Devis Tuia. "ConVQG: Contrastive Visual Question Generation with Multimodal Guidance." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (March 24, 2024): 4207–15. http://dx.doi.org/10.1609/aaai.v38i5.28216.

Full text
Abstract:
Asking questions about visual environments is a crucial way for intelligent agents to understand rich multi-faceted scenes, raising the importance of Visual Question Generation (VQG) systems. Apart from being grounded to the image, existing VQG systems can use textual constraints, such as expected answers or knowledge triplets, to generate focused questions. These constraints allow VQG systems to specify the question content or leverage external commonsense knowledge that can not be obtained from the image content only. However, generating focused questions using textual constraints while enforcing a high relevance to the image content remains a challenge, as VQG systems often ignore one or both forms of grounding. In this work, we propose Contrastive Visual Question Generation (ConVQG), a method using a dual contrastive objective to discriminate questions generated using both modalities from those based on a single one. Experiments on both knowledge-aware and standard VQG benchmarks demonstrate that ConVQG outperforms the state-of-the-art methods and generates image-grounded, text-guided, and knowledge-rich questions. Our human evaluation results also show preference for ConVQG questions compared to non-contrastive baselines.
APA, Harvard, Vancouver, ISO, and other styles
4

Sarrouti, Mourad, Asma Ben Abacha, and Dina Demner-Fushman. "Goal-Driven Visual Question Generation from Radiology Images." Information 12, no. 8 (August 20, 2021): 334. http://dx.doi.org/10.3390/info12080334.

Full text
Abstract:
Visual Question Generation (VQG) from images is a rising research topic in both fields of natural language processing and computer vision. Although there are some recent efforts towards generating questions from images in the open domain, the VQG task in the medical domain has not been well-studied so far due to the lack of labeled data. In this paper, we introduce a goal-driven VQG approach for radiology images called VQGRaD that generates questions targeting specific image aspects such as modality and abnormality. In particular, we study generating natural language questions based on the visual content of the image and on additional information such as the image caption and the question category. VQGRaD encodes the dense vectors of different inputs into two latent spaces, which allows generating, for a specific question category, relevant questions about the images, with or without their captions. We also explore the impact of domain knowledge incorporation (e.g., medical entities and semantic types) and data augmentation techniques on visual question generation in the medical domain. Experiments performed on the VQA-RAD dataset of clinical visual questions showed that VQGRaD achieves 61.86% BLEU score and outperforms strong baselines. We also performed a blinded human evaluation of the grammaticality, fluency, and relevance of the generated questions. The human evaluation demonstrated the better quality of VQGRaD outputs and showed that incorporating medical entities improves the quality of the generated questions. Using the test data and evaluation process of the ImageCLEF 2020 VQA-Med challenge, we found that relying on the proposed data augmentation technique to generate new training samples by applying different kinds of transformations, can mitigate the lack of data, avoid overfitting, and bring a substantial improvement in medical VQG.
APA, Harvard, Vancouver, ISO, and other styles
5

Pang, Wei, and Xiaojie Wang. "Visual Dialogue State Tracking for Question Generation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11831–38. http://dx.doi.org/10.1609/aaai.v34i07.6856.

Full text
Abstract:
GuessWhat?! is a visual dialogue task between a guesser and an oracle. The guesser aims to locate an object supposed by the oracle oneself in an image by asking a sequence of Yes/No questions. Asking proper questions with the progress of dialogue is vital for achieving successful final guess. As a result, the progress of dialogue should be properly represented and tracked. Previous models for question generation pay less attention on the representation and tracking of dialogue states, and therefore are prone to asking low quality questions such as repeated questions. This paper proposes visual dialogue state tracking (VDST) based method for question generation. A visual dialogue state is defined as the distribution on objects in the image as well as representations of objects. Representations of objects are updated with the change of the distribution on objects. An object-difference based attention is used to decode new question. The distribution on objects is updated by comparing the question-answer pair and objects. Experimental results on GuessWhat?! dataset show that our model significantly outperforms existing methods and achieves new state-of-the-art performance. It is also noticeable that our model reduces the rate of repeated questions from more than 50% to 21.9% compared with previous state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
6

Kamala, M. "Visual Question Generation from Remote Sensing Images Using Gemini API." International Journal for Research in Applied Science and Engineering Technology 12, no. 3 (March 31, 2024): 2924–29. http://dx.doi.org/10.22214/ijraset.2024.59537.

Full text
Abstract:
Abstract: Visual Question Generation Extracting Information from Remote Sensing Images Remote Sensing Images plays a vital role in understanding and extracting information from aerial and satellite images. Utilizing Bidirectional Encoder Representation from Transformers (BERT) for extracting valuable insights from remote sensing images. Gemini Application Programming Interface(API), and Convolution Neural Networks (CNNs) are used. First, The proposed methodology employs CNN to extract high-level features from remote sensing images, capturing spatial data and generatingquestions. Similarly, the Gemini Application Programming Interface(API) integrates contextual understanding into the question-generation process by providing relevant environmental data. Lastly, BERT functions as a language model in which employees enhance and refine the generated questions by taking into account both the syntax and semantics. Hence, by combining all these techniques we are capable of generating required relevant questions from remote sensing images in an enhanced and efficient way.
APA, Harvard, Vancouver, ISO, and other styles
7

Kachare, Atul, Mukesh Kalla, and Ashutosh Gupta. "Visual Question Generation Answering (VQG-VQA) using Machine Learning Models." WSEAS TRANSACTIONS ON SYSTEMS 22 (June 28, 2023): 663–70. http://dx.doi.org/10.37394/23202.2023.22.67.

Full text
Abstract:
Presented automated visual question-answer system generates graphics-based question-answer pairs. The system consists of the Visual Query Generation (VQG) and Visual Question Answer (VQA) modules. VQG generates questions based on visual cues, and VQA provides matching answers to the VQG modules. VQG system generates questions using LSTM and VGG19 model, training parameters, and predicting words with the highest probability for output. VQA uses VGG-19 convolutional neural network for image encoding, embedding, and multilayer perceptron for high-quality responses. The proposed system reduces the need for human annotation and thus supports the traditional education sector by significantly reducing the human intervention required to generate text queries. The system can be used in interactive interfaces to help young children learn.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhu, He, Ren Togo, Takahiro Ogawa, and Miki Haseyama. "Diversity Learning Based on Multi-Latent Space for Medical Image Visual Question Generation." Sensors 23, no. 3 (January 17, 2023): 1057. http://dx.doi.org/10.3390/s23031057.

Full text
Abstract:
Auxiliary clinical diagnosis has been researched to solve unevenly and insufficiently distributed clinical resources. However, auxiliary diagnosis is still dominated by human physicians, and how to make intelligent systems more involved in the diagnosis process is gradually becoming a concern. An interactive automated clinical diagnosis with a question-answering system and a question generation system can capture a patient’s conditions from multiple perspectives with less physician involvement by asking different questions to drive and guide the diagnosis. This clinical diagnosis process requires diverse information to evaluate a patient from different perspectives to obtain an accurate diagnosis. Recently proposed medical question generation systems have not considered diversity. Thus, we propose a diversity learning-based visual question generation model using a multi-latent space to generate informative question sets from medical images. The proposed method generates various questions by embedding visual and language information in different latent spaces, whose diversity is trained by our newly proposed loss. We have also added control over the categories of generated questions, making the generated questions directional. Furthermore, we use a new metric named similarity to accurately evaluate the proposed model’s performance. The experimental results on the Slake and VQA-RAD datasets demonstrate that the proposed method can generate questions with diverse information. Our model works with an answering model for interactive automated clinical diagnosis and generates datasets to replace the process of annotation that incurs huge labor costs.
APA, Harvard, Vancouver, ISO, and other styles
9

Boukhers, Zeyd, Timo Hartmann, and Jan Jürjens. "COIN: Counterfactual Image Generation for Visual Question Answering Interpretation." Sensors 22, no. 6 (March 14, 2022): 2245. http://dx.doi.org/10.3390/s22062245.

Full text
Abstract:
Due to the significant advancement of Natural Language Processing and Computer Vision-based models, Visual Question Answering (VQA) systems are becoming more intelligent and advanced. However, they are still error-prone when dealing with relatively complex questions. Therefore, it is important to understand the behaviour of the VQA models before adopting their results. In this paper, we introduce an interpretability approach for VQA models by generating counterfactual images. Specifically, the generated image is supposed to have the minimal possible change to the original image and leads the VQA model to give a different answer. In addition, our approach ensures that the generated image is realistic. Since quantitative metrics cannot be employed to evaluate the interpretability of the model, we carried out a user study to assess different aspects of our approach. In addition to interpreting the result of VQA models on single images, the obtained results and the discussion provides an extensive explanation of VQA models’ behaviour.
APA, Harvard, Vancouver, ISO, and other styles
10

Guo, Zihan, Dezhi Han, and Kuan-Ching Li. "Double-layer affective visual question answering network." Computer Science and Information Systems, no. 00 (2020): 38. http://dx.doi.org/10.2298/csis200515038g.

Full text
Abstract:
Visual Question Answering (VQA) has attracted much attention recently in both natural language processing and computer vision communities, as it offers insight into the relationships between two relevant sources of information. Tremendous advances are seen in the field of VQA due to the success of deep learning. Based upon advances and improvements, the Affective Visual Question Answering Network (AVQAN) enriches the understanding and analysis of VQA models by making use of the emotional information contained in the images to produce sensitive answers, while maintaining the same level of accuracy as ordinary VQA baseline models. It is a reasonably new task to integrate the emotional information contained in the images into VQA. However, it is challenging to separate question guided-attention from mood-guided-attention due to the concatenation of the question words and the mood labels in AVQAN. Also, it is believed that this type of concatenation is harmful to the performance of the model. To mitigate such an effect, we propose the Double-Layer Affective Visual Question Answering Network (DAVQAN) that divides the task of generating emotional answers in VQA into two simpler subtasks: the generation of non-emotional responses and the production of mood labels, and two independent layers are utilized to tackle these subtasks. Comparative experimentation conducted on a preprocessed dataset to performance comparison shows that the overall performance of DAVQAN is 7.6% higher than AVQAN, demonstrating the effectiveness of the proposed model. We also introduce more advanced word embedding method and more fine-grained image feature extractor into AVQAN and DAVQAN to further improve their performance and obtain better results than their original models, which proves that VQA integrated with affective computing can improve the performance of the whole model by improving these two modules just like the general VQA.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Visual question generation"

1

Bordes, Patrick. "Deep Multimodal Learning for Joint Textual and Visual Reasoning." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS370.

Full text
Abstract:
Au cours de la dernière décennie, l'évolution des techniques d'apprentissage en profondeur, combinée à une augmentation importante des données multimodales a suscité un intérêt croissant dans la communauté de recherche pour la compréhension conjointe du langage et de la vision. Le défi au cœur de l'apprentissage automatique multimodal est la différence sémantique entre le langage et la vision: alors que la vision représente fidèlement la réalité et transmet une sémantique de bas niveau, le langage porte un raisonnement de haut niveau. D'une part, le langage peut améliorer les performances des modèles de vision. L'hypothèse sous-jacente est que les représentations textuelles contiennent des informations visuelles. Nous appliquons ce principe au Zero-Shot Learning. Dans la première contribution en ZSL, nous étendons une hypothèse commune, qui stipule que les représentations textuelles codent des informations sur l'apparence visuelle des objets, en montrant qu'elles codent également des informations sur leur environnement visuel et leur fréquence réelle. Dans une seconde contribution, nous considérons le cadre transductif en ZSL. Nous proposons une solution aux limites des approches transductives actuelles, qui supposent que l'espace visuel est bien groupé, ce qui n'est pas vrai lorsque le nombre de classes inconnues est élevé. D'un autre côté, la vision peut élargir les capacités des modèles linguistiques. Nous le démontrons en abordant la génération de questions visuelles (VQG), qui étend la tâche standard de génération de questions en utilisant une image comme entrée complémentaire, en utilisant des représentations visuelles dérivées de la vision par ordinateur
In the last decade, the evolution of Deep Learning techniques to learn meaningful data representations for text and images, combined with an important increase of multimodal data, mainly from social network and e-commerce websites, has triggered a growing interest in the research community about the joint understanding of language and vision. The challenge at the heart of Multimodal Machine Learning is the intrinsic difference in semantics between language and vision: while vision faithfully represents reality and conveys low-level semantics, language is a human construction carrying high-level reasoning. One the one hand, language can enhance the performance of vision models. The underlying hypothesis is that textual representations contain visual information. We apply this principle to two Zero-Shot Learning tasks. In the first contribution on ZSL, we extend a common assumption in ZSL, which states that textual representations encode information about the visual appearance of objects, by showing that they also encode information about their visual surroundings and their real-world frequence. In a second contribution, we consider the transductive setting in ZSL. We propose a solution to the limitations of current transductive approaches, that assume that the visual space is well-clustered, which does not hold true when the number of unknown classes is high. On the other hand, vision can expand the capacities of language models. We demonstrate it by tackling Visual Question Generation (VQG), which extends the standard Question Generation task by using an image as complementary input, by using visual representations derived from Computer Vision
APA, Harvard, Vancouver, ISO, and other styles
2

Chowdhury, Muhammad Iqbal Hasan. "Question-answering on image/video content." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/205096/1/Muhammad%20Iqbal%20Hasan_Chowdhury_Thesis.pdf.

Full text
Abstract:
This thesis explores a computer's ability to understand multimodal data where the correspondence between image/video content and natural language text are utilised to answer open-ended natural language questions through question-answering tasks. Static image data consisting of both indoor and outdoor scenes, where complex textual questions are arbitrarily posed to a machine to generate correct answers, was examined. Dynamic videos consisting of both single-camera and multi-camera settings for the exploration of more challenging and unconstrained question-answering tasks were also considered. In exploring these challenges, new deep learning processes were developed to improve a computer's ability to understand and consider multimodal data.
APA, Harvard, Vancouver, ISO, and other styles
3

Testoni, Alberto. "Asking Strategic and Informative Questions in Visual Dialogue Games: Strengths and Weaknesses of Neural Generative Models." Doctoral thesis, Università degli studi di Trento, 2023. https://hdl.handle.net/11572/370672.

Full text
Abstract:
Gathering information by asking questions about the surrounding world is a hallmark of human intelligence. Modelling this feature in Natural Language Generation systems represents a central challenge for effective and reliable conversational agents. The evaluation of these systems plays a crucial role in understanding the strengths and weaknesses of current neural architectures. In the scientific community, there is an open debate about what makes generated dialogues sound natural and human-like, and there is no agreement on what measures to use to track progress. In the first part of the thesis, after reviewing existing metrics, we aggregate different and complementary metrics that capture surface-level linguistic features into one single score. We take different referential tasks (both multimodal and language-only) as test-bed and wonder how the single metric we propose relates to task success across the training epochs of computational models (Chapter 3). Based on our findings, on the one hand, we present a method that intervenes on the training data to improve surface-level metrics (Chapter 4), especially repetitions in the generated dialogues. On the other hand, given the limitations of surface-level metrics to capture relevant phenomena that improve referential task success, we propose a different approach for the evaluation of computational models on a deeper level to capture the interplay between the Encoder and Decoder components. In the second part of the thesis, we take the case of entity hallucinations in multimodal dialogue systems as a case study to investigate the relationship between Natural Language Generation and Natural Language Understanding on a more fine-grained level (Chapter 5). Our results reveal that these two components are profoundly interconnected and influence one another. We find that hallucinations create a detrimental cascade effect on consecutive dialogue turns and are more likely to appear after negative answers, corroborating evidence from previous work on the deficiency of current architectures to handle negation properly. Our progressive advance towards even deeper dialogue evaluation criteria leads us to the study of the informativeness of questions asked to solve referential tasks. Current decoding strategies generate text in a word-by-word fashion, according to the probabilities of underlying language models. We advocate for the need of going beyond this paradigm and injecting high-level reasoning skills at decoding time. Inspired by cognitive studies on the question-asking strategies of children and adults, we propose a beam search re-ranking technique that implements a confirmation-driven strategy across dialogue turns, and we compare it against a wide variety of different decoding strategies and hyperparameters configurations (Chapter 6 and Chapter 7). We demonstrate that our approach effectively improves task success and dialogue quality when considering both the surface-level metrics described in the first part of the thesis and more fine-grained features such as hallucinations. To make our findings more solid and rule out the possibility that our improvements are due to biases in the model, we propose an evaluation paradigm in which human annotators receive machine-generated dialogues and have to solve the referential task. In general, we find that this paradigm confirms the results obtained with computational models and it demonstrates that machine-generated dialogues are indeed informative to solve the task. In the last part of the thesis (Chapter 8), we broaden the horizons on what is still missing from achieving human-like dialogue systems. We present a large-scale study on the GuessWhat dataset of human-human conversations, usually exploited only to train computational models. Instead, we present a thorough evaluation of the question-asking strategies of human players in this problem-solving task to unveil the pragmatic phenomena that characterize their conversations. Our analyses reveal that humans are far from asking optimal questions. Instead, their efficiency arises from learning to ask uninformative questions at the right moment during the dialogue, i.e., to establish a common ground with the interlocutor at the start of the dialogue exchanges and ask for confirmation of their own hypotheses before deciding to end the dialogue and select the target. We believe modelling such peculiar and effective features of human conversations in dialogue systems is an essential step toward building competent systems that meet the users’ expectations and display human-like traits.
APA, Harvard, Vancouver, ISO, and other styles
4

Wei, Min-Chia, and 魏敏家. "Evaluation of Visual Question Generation With Captions." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/65t4uu.

Full text
Abstract:
碩士
國立臺灣大學
資訊工程學研究所
106
Over the last few years, there have been many types of research in the vision and language community. There are many popular topics, for example, image captions, video transcription, question answering about images or videos, Image-Grounded Conversation(IGC) and Visual Question Generation(VQG). In this thesis, we focus on question generation about images. Because of the popularity of image on social media, people always upload an image with some descriptions, we think that maybe image captions can help Artificial Intelligence (AI) to learn to ask more natural questions. We proposed new pipeline models for fusing both visual and textual features, do experiments on different models and compare the prediction questions. In our results of experiments, the captions are definitely useful for visual question generation.
APA, Harvard, Vancouver, ISO, and other styles
5

Anderson, Peter James. "Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents." Phd thesis, 2018. http://hdl.handle.net/1885/164018.

Full text
Abstract:
Each time we ask for an object, describe a scene, follow directions or read a document containing images or figures, we are converting information between visual and linguistic representations. Indeed, for many tasks it is essential to reason jointly over visual and linguistic information. People do this with ease, typically without even noticing. Intelligent systems that perform useful tasks in unstructured situations, and interact with people, will also require this ability. In this thesis, we focus on the joint modelling of visual and linguistic information using deep neural networks. We begin by considering the challenging problem of automatically describing the content of an image in natural language, i.e., image captioning. Although there is considerable interest in this task, progress is hindered by the difficulty of evaluating the generated captions. Our first contribution is a new automatic image caption evaluation metric that measures the quality of generated captions by analysing their semantic content. Extensive evaluations across a range of models and datasets indicate that our metric, dubbed SPICE, shows high correlation with human judgements. Armed with a more effective evaluation metric, we address the challenge of image captioning. Visual attention mechanisms have been widely adopted in image captioning and visual question answering (VQA) architectures to facilitate fine-grained visual processing. We extend existing approaches by proposing a bottom-up and top-down attention mechanism that enables attention to be focused at the level of objects and other salient image regions, which is the natural basis for attention to be considered. Applying this approach to image captioning we achieve state of the art results on the COCO test server. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge. Despite these advances, recurrent neural network (RNN) image captioning models typically do not generalise well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real applications. To address this problem, we propose constrained beam search, an approximate search algorithm that enforces constraints over RNN output sequences. Using this approach, we show that existing RNN captioning architectures can take advantage of side information such as object detector outputs and ground-truth image annotations at test time, without retraining. Our results significantly outperform previous approaches that incorporate the same information into the learning algorithm, achieving state of the art results for out-of-domain captioning on COCO. Last, to enable and encourage the application of vision and language methods to problems involving embodied agents, we present the Matterport3D Simulator, a large-scale interactive reinforcement learning environment constructed from densely-sampled panoramic RGB-D images of 90 real buildings. Using this simulator, which can in future support a range of embodied vision and language tasks, we collect the first benchmark dataset for visually-grounded natural language navigation in real buildings. We investigate the difficulty of this task, and particularly the difficulty of operating in unseen environments, using several baselines and a sequence-to-sequence model based on methods successfully applied to other vision and language tasks.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Visual question generation"

1

Dadyan, Eduard. Modern programming technologies. The C#language. Volume 1. For novice users. ru: INFRA-M Academic Publishing LLC., 2021. http://dx.doi.org/10.12737/1196552.

Full text
Abstract:
Volume 1 of the textbook is addressed to novice users who want to learn the popular object-oriented programming language C#. The tutorial provides complete information about the C# language and platform .NET. Basic data types, variables, functions, and arrays are considered. Working with dates and enumerations is shown. The elements and constructs of the language are described: classes, interfaces, assemblies, manifests, namespaces, collections, generalizations, delegates, events, etc. It provides information about Windows processes and threads, as well as examples of organizing work in multithreaded mode. The questions of creating console applications, applications such as Windows Forms and applications for working with databases, as well as questions of deep and advanced development of the material are described. The Visual Studio. NET environment is considered as the development environment. All sample programs are given in C#. Meets the requirements of the federal state educational standards of higher education of the latest generation. It is intended for students studying in the direction of training 09.03.03 "Applied Informatics", undergraduate and graduate students of all specialties, as well as graduate students and students of the IPC.
APA, Harvard, Vancouver, ISO, and other styles
2

Nowell Smith, David. W. S. Graham. Oxford University Press, 2022. http://dx.doi.org/10.1093/oso/9780192842909.001.0001.

Full text
Abstract:
Only in recent years has W. S. Graham come to be recognised as one of the great poets of the twentieth century. On the peripheries of UK poetry culture during his lifetime, he in many ways appears to us today as exemplary of the poetics of the mid-century: his extension of modernist explorations of rhythm and diction; his interweaving of linguistic and geographic places; his dialogue with the plastic arts; and the tensions that run through his work, between philosophical seriousness and play, between solitude and sociality, regionalism and cosmopolitanism, between the heft and evanescence of poetry’s medium. In the first concerted study of Graham’s poetics in a generation, David Nowell Smith draws on newly unearthed archival materials - poems, manuscripts, and visual/mixed-media work - to orient Graham’s poetics around the question of the ‘art object’. Graham sought throughout his work to craft his poems into honed, finished ‘objects’; yet he was also intensely aware that poems only live when released into their afterlives: the poem’s ‘finished object’ is never wholly finished. Nowell Smith situates this tension with broader debates around literary objecthood and builds up a broader reflection on language as a medium for art-making.
APA, Harvard, Vancouver, ISO, and other styles
3

Buchner, Helmut. Evoked potentials. Oxford University Press, 2016. http://dx.doi.org/10.1093/med/9780199688395.003.0015.

Full text
Abstract:
Evoked potentials (EPs) occur in the peripheral and the central nervous system. The low amplitude signals are extracted from noise by averaging multiple time epochs time-locked to a sensory stimulus. The mechanisms of generation, the techniques for stimulation and recording are established. Clinical applications provide robust information to various questions. The importance of EPs is to measure precisely the conduction times within the stimulated sensory system. Visual evoked potentials to a pattern reversal checker board stimulus are commonly used to evaluate the optic nerve. Auditory evoked potentials following ‘click’ stimuli delivered by a headset are most often used to test the auditory nerve and for prognostication in comatose patients. Somatosensory evoked potentials to electrical stimulation of distal nerves evaluate the peripheral nerve and the lemniscal system, and have various indications from demyelinating diseases to the monitoring of operations and prognosis of comatose patients.
APA, Harvard, Vancouver, ISO, and other styles
4

Fox, Kieran C. R. Neural Origins of Self-Generated Thought. Edited by Kalina Christoff and Kieran C. R. Fox. Oxford University Press, 2018. http://dx.doi.org/10.1093/oxfordhb/9780190464745.013.1.

Full text
Abstract:
Functional magnetic resonance imaging (fMRI) has begun to narrow down the neural correlates of self-generated forms of thought, with current evidence pointing toward central roles for the default, frontoparietal, and visual networks. Recent work has linked the arising of thoughts more specifically to default network activity, but the limited temporal resolution of fMRI has precluded more detailed conclusions about where in the brain self-created mental content is generated and how this is achieved. This chapter argues that the unparalleled spatiotemporal resolution of intracranial electrophysiology (iEEG) in human epilepsy patients can begin to provide answers to questions about the specific neural origins of self-generated thought. The chapter reviews the extensive body of literature from iEEG studies over the past few decades and shows that many studies involving passive recording or direct electrical stimulation throughout the brain point to the medial temporal lobe as a key site of thought-generation.
APA, Harvard, Vancouver, ISO, and other styles
5

Brantingham, Patricia L., Paul J. Brantingham, Justin Song, and Valerie Spicer. Advances in Visualization for Theory Testing in Environmental Criminology. Edited by Gerben J. N. Bruinsma and Shane D. Johnson. Oxford University Press, 2018. http://dx.doi.org/10.1093/oxfordhb/9780190279707.013.37.

Full text
Abstract:
This chapter discusses advances in visualization for environmental criminology. The environment within which people move has many dimensions that influence or constrain decisions and actions by individuals and by groups. This complexity creates a challenge for theoreticians and researchers in presenting their research results in a way that conveys the dynamic spatiotemporal aspects of crime and actions by offenders in a clearly understandable way. There is an increasing need in environmental criminology to use scientific visualization to convey research results. A visual image can describe underlying patterns in a way that is intuitively more understandable than text and numeric tables. The advent of modern information systems generating large and deep data sets (Big Data) provides researchers unparalleled possibilities for asking and answering questions about crime and the environment. This will require new techniques and methods for presenting findings and visualization will be key.
APA, Harvard, Vancouver, ISO, and other styles
6

Gover, K. E. Art and Authority. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198768692.001.0001.

Full text
Abstract:
Art and Authority is a philosophical essay on artistic authority and freedom: its sources, nature, and limits. It draws upon real-world cases and controversies in contemporary visual art and connects them to significant theories in the philosophical literature on art and aesthetics. Artworks, it is widely agreed, are the products of intentional human activity. And yet they are different from other kinds of artifacts; for one thing, they are meaningful. It is often presumed that artworks are an extension of their makers’ personality in ways that other kinds of artifacts are not. This is clear from our recognition that an artist continues to own his or her creation even once the art object, in which the artwork inheres, belongs to another. But it is far from clear how or why artists acquire this authority, and whether it originates from a special, intimate bond between artist and artwork. In response to these questions, the book argues for a ‘dual-intention theory’ of artistic authorship, in which it is claimed that authorship entails two orders of intention. The first, ‘generative’ moment, names the intentions that lead to the production of an artwork. The second, ‘evaluative’ moment, names the decision in which the artist decides whether or not to accept the artwork as part of their corpus.
APA, Harvard, Vancouver, ISO, and other styles
7

Campbell, Kenneth L. Western Civilization in a Global Context: Prehistory to the Enlightenment. Bloomsbury Publishing Plc, 2015. http://dx.doi.org/10.5040/9781474275491.

Full text
Abstract:
Western Civilization in a Global Context is a source collection that introduces a comparative element to the study of Western civilization, offering students an opportunity to explore non-Western perspectives. An interesting and provocative set of readings are included, from a range of primary sources, including official documents, historical writings, literary sources, letters, speeches, interviews as well as visual sources. These different sources are carefully selected with a view to generating class discussion and providing students with a sense of the different approaches historians might take to understanding the past. Volume I covers prehistory to the Enlightenment, including sources that offer insight into the political, social, religious, cultural and intellectual history of this period. Topics covered include: - The Rise of Rome - Byzantine Civilization - The Renaissance in Europe and China - Religious Reformation - European Expansion - The Scientific Revolution To aid student engagement and understanding, the book begins with a guide to using primary sources, includes questions for discussion throughout and concludes with a glossary of key terms. Western Civilization in a Global Context is the ideal companion for students who want to explore the contribution of non-Western cultures, and gain a more thorough understanding of the complex history of the world as a result.
APA, Harvard, Vancouver, ISO, and other styles
8

Contreras, Ayana. Energy Never Dies. University of Illinois Press, 2021. http://dx.doi.org/10.5622/illinois/9780252044069.001.0001.

Full text
Abstract:
Black Chicago in the post–civil rights era was constantly refreshed by an influx of newcomers from the American South via the Great Migration. Chicago was a beacon, disseminating a fresh, powerful definition of Black identity primarily through music, art, and entrepreneurship and mass media. This book uses ruminations on oft-undervalued found ephemeral materials (like a fan club pamphlet or a creamy-white Curtis Mayfield record) and a variety of in-depth original and archival interviews to unearth tales of the aspiration, will, courage, and imagination born in Black Chicago. It also questions what vestiges of our past we choose to value in this digital age. These stories serve as homespun folktales of hope to counter darker popular narratives about the South and West Sides of the city. They also express the ongoing quest for identity and self-determination, a quest that fueled the earlier Black Arts Movement, and is again at the heart of the Black Arts renaissance currently blossoming in Black Chicago, from genre-spanning musicians like Chance the Rapper, Noname, the Juju Exchange, and Makaya McCraven, and from visual artists like Theaster Gates and Kerry James Marshall, and up-and-comers like Brandon Breaux. Meanwhile, many of the creative giants of previous generations are struggling (Ebony magazine and the groundbreaking DuSable Museum among them). But this text asserts that energy never dies, and creativity will live on beyond this juncture, regardless of the outcome.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Visual question generation"

1

Wu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. "Visual Question Generation." In Visual Question Answering, 189–97. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chen, Feng, Jiayuan Xie, Yi Cai, Tao Wang, and Qing Li. "Difficulty-Controllable Visual Question Generation." In Web and Big Data, 332–47. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-85896-4_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Xu, Feifei, Yingchen Zhou, Zheng Zhong, and Guangzhen Li. "Object Category-Based Visual Dialog for Effective Question Generation." In Computational Visual Media, 316–31. Singapore: Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-2092-7_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zhang, Junjie, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, and Anton van den Hengel. "Goal-Oriented Visual Question Generation via Intermediate Rewards." In Computer Vision – ECCV 2018, 189–204. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01228-1_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Nahar, Shrey, Shreya Naik, Niti Shah, Saumya Shah, and Lakshmi Kurup. "Automated Question Generation and Answer Verification Using Visual Data." In Studies in Computational Intelligence, 99–114. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-38445-6_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Uehara, Kohei, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, and Tatsuya Harada. "Visual Question Generation for Class Acquisition of Unknown Objects." In Computer Vision – ECCV 2018, 492–507. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01258-8_30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chai, Zi, Xiaojun Wan, Soyeon Caren Han, and Josiah Poon. "Visual Question Generation Under Multi-granularity Cross-Modal Interaction." In MultiMedia Modeling, 255–66. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-27077-2_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Salewski, Leonard, A. Sophia Koepke, Hendrik P. A. Lensch, and Zeynep Akata. "CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations." In xxAI - Beyond Explainable AI, 69–88. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-04083-2_5.

Full text
Abstract:
AbstractProviding explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available at https://github.com/ExplainableML/CLEVR-X.
APA, Harvard, Vancouver, ISO, and other styles
9

Koeva, Svetla. "Multilingual Image Corpus." In European Language Grid, 313–18. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-17258-8_22.

Full text
Abstract:
AbstractThe ELG pilot project Multilingual Image Corpus (MIC 21) provides a large image dataset with annotated objects and multilingual descriptions in 25 languages. Our main contributions are: the provision of a large collection of highquality, copyright-free images; the formulation of an ontology of visual objects based on WordNet noun hierarchies; precise manual correction of automatic image segmentation and annotation of object classes; and association of objects and images with extended multilingual descriptions. The dataset is designed for image classification, object detection and semantic segmentation. It can be also used for multilingual image caption generation, image-to-text alignment and automatic question answering for images and videos.
APA, Harvard, Vancouver, ISO, and other styles
10

Shi, Yanan, Yanxin Tan, Fangxiang Feng, Chunping Zheng, and Xiaojie Wang. "Category-Based Strategy-Driven Question Generator for Visual Dialogue." In Lecture Notes in Computer Science, 177–92. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-84186-7_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Visual question generation"

1

Vedd, Nihir, Zixu Wang, Marek Rei, Yishu Miao, and Lucia Specia. "Guiding Visual Question Generation." In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.naacl-main.118.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bi, Chao, Shuhui Wang, Zhe Xue, Shengbo Chen, and Qingming Huang. "Inferential Visual Question Generation." In MM '22: The 30th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503161.3548055.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Shijie, Lizhen Qu, Shaodi You, Zhenglu Yang, and Jiawan Zhang. "Automatic Generation of Grounded Visual Questions." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/592.

Full text
Abstract:
In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on visual input. To the best of our knowledge, it lacks automatic methods to generate meaningful questions with various types for the same visual input. To circumvent the problem, we propose a model that automatically generates visually grounded questions with varying types. Our model takes as input both images and the captions generated by a dense caption model, samples the most probable question types, and generates the questions in sequel. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin.
APA, Harvard, Vancouver, ISO, and other styles
4

Fan, Zhihao, Zhongyu Wei, Piji Li, Yanyan Lan, and Xuanjing Huang. "A Question Type Driven Framework to Diversify Visual Question Generation." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/563.

Full text
Abstract:
Visual question generation aims at asking questions about an image automatically. Existing research works on this topic usually generate a single question for each given image without considering the issue of diversity. In this paper, we propose a question type driven framework to produce multiple questions for a given image with different focuses. In our framework, each question is constructed following the guidance of a sampled question type in a sequence-to-sequence fashion. To diversify the generated questions, a novel conditional variational auto-encoder is introduced to generate multiple questions with a specific question type. Moreover, we design a strategy to conduct the question type distribution learning for each image to select the final questions. Experimental results on three benchmark datasets show that our framework outperforms the state-of-the-art approaches in terms of both relevance and diversity.
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Yikang, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang, and Ming Zhou. "Visual Question Generation as Dual Task of Visual Question Answering." In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018. http://dx.doi.org/10.1109/cvpr.2018.00640.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Krishna, Ranjay, Michael Bernstein, and Li Fei-Fei. "Information Maximizing Visual Question Generation." In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019. http://dx.doi.org/10.1109/cvpr.2019.00211.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Patil, Charulata, and Anagha Kulkarni. "Attention-based Visual Question Generation." In 2021 International Conference on Emerging Smart Computing and Informatics (ESCI). IEEE, 2021. http://dx.doi.org/10.1109/esci50559.2021.9396956.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Xie, Jiayuan, Yi Cai, Qingbao Huang, and Tao Wang. "Multiple Objects-Aware Visual Question Generation." In MM '21: ACM Multimedia Conference. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3474085.3476969.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Xu, Xing, Jingkuan Song, Huimin Lu, Li He, Yang Yang, and Fumin Shen. "Dual Learning for Visual Question Generation." In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2018. http://dx.doi.org/10.1109/icme.2018.8486475.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Rathi, Snehal, Atharv Raje, Gauri Ghule, Shruti Sankpal, Soham Shitole, and Priyanka More. "Visual Question Generation Using Deep Learning." In 2023 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). IEEE, 2023. http://dx.doi.org/10.1109/icccis60361.2023.10425302.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography