Academic literature on the topic 'Video question answering'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Video question answering.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Video question answering"
Lei, Chenyi, Lei Wu, Dong Liu, Zhao Li, Guoxin Wang, Haihong Tang, and Houqiang Li. "Multi-Question Learning for Visual Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11328–35. http://dx.doi.org/10.1609/aaai.v34i07.6794.
Full textRuwa, Nelson, Qirong Mao, Liangjun Wang, and Jianping Gou. "Affective question answering on video." Neurocomputing 363 (October 2019): 125–39. http://dx.doi.org/10.1016/j.neucom.2019.06.046.
Full textWang, Yueqian, Yuxuan Wang, Kai Chen, and Dongyan Zhao. "STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 19215–23. http://dx.doi.org/10.1609/aaai.v38i17.29890.
Full textZong, Linlin, Jiahui Wan, Xianchao Zhang, Xinyue Liu, Wenxin Liang, and Bo Xu. "Video-Context Aligned Transformer for Video Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 19795–803. http://dx.doi.org/10.1609/aaai.v38i17.29954.
Full textHuang, Deng, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, and Chuang Gan. "Location-Aware Graph Convolutional Networks for Video Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11021–28. http://dx.doi.org/10.1609/aaai.v34i07.6737.
Full textGao, Lianli, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, and Heng Tao Shen. "Structured Two-Stream Attention Network for Video Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6391–98. http://dx.doi.org/10.1609/aaai.v33i01.33016391.
Full textKumar, Krishnamoorthi Magesh, and P. Valarmathie. "Domain and Intelligence Based Multimedia Question Answering System." International Journal of Evaluation and Research in Education (IJERE) 5, no. 3 (September 1, 2016): 227. http://dx.doi.org/10.11591/ijere.v5i3.4544.
Full textXue, Hongyang, Zhou Zhao, and Deng Cai. "Unifying the Video and Question Attentions for Open-Ended Video Question Answering." IEEE Transactions on Image Processing 26, no. 12 (December 2017): 5656–66. http://dx.doi.org/10.1109/tip.2017.2746267.
Full textJang, Yunseok, Yale Song, Chris Dongjoo Kim, Youngjae Yu, Youngjin Kim, and Gunhee Kim. "Video Question Answering with Spatio-Temporal Reasoning." International Journal of Computer Vision 127, no. 10 (June 18, 2019): 1385–412. http://dx.doi.org/10.1007/s11263-019-01189-x.
Full textZhuang, Yueting, Dejing Xu, Xin Yan, Wenzhuo Cheng, Zhou Zhao, Shiliang Pu, and Jun Xiao. "Multichannel Attention Refinement for Video Question Answering." ACM Transactions on Multimedia Computing, Communications, and Applications 16, no. 1s (April 28, 2020): 1–23. http://dx.doi.org/10.1145/3366710.
Full textDissertations / Theses on the topic "Video question answering"
Engin, Deniz. "Video question answering with limited supervision." Electronic Thesis or Diss., Université de Rennes (2023-....), 2024. http://www.theses.fr/2024URENS016.
Full textVideo content has significantly increased in volume and diversity in the digital era, and this expansion has highlighted the necessity for advanced video understanding technologies. Driven by this necessity, this thesis explores semantically understanding videos, leveraging multiple perceptual modes similar to human cognitive processes and efficient learning with limited supervision similar to human learning capabilities. This thesis specifically focuses on video question answering as one of the main video understanding tasks. Our first contribution addresses long-range video question answering, requiring an understanding of extended video content. While recent approaches rely on human-generated external sources, we process raw data to generate video summaries. Our following contribution explores zero-shot and few-shot video question answering, aiming to enhance efficient learning from limited data. We leverage the knowledge of existing large-scale models by eliminating challenges in adapting pre-trained models to limited data. We demonstrate that these contributions significantly enhance the capabilities of multimodal video question-answering systems, where specifically human-annotated labeled data is limited or unavailable
Chowdhury, Muhammad Iqbal Hasan. "Question-answering on image/video content." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/205096/1/Muhammad%20Iqbal%20Hasan_Chowdhury_Thesis.pdf.
Full textZeng, Kuo-Hao, and 曾國豪. "Video titling and Question-Answering." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/a3a6sw.
Full text國立清華大學
電機工程學系所
105
Video titling and question answering are two important tasks toward high-level visual data understanding. To address those two tasks, we propose a large-scale dataset and demonstrate several models on such dataset in this work. A great video title describes the most salient event compactly and captures the viewer's attention. In contrast, video captioning tends to generate sentences that describe the video as a whole. Although generating a video title automatically is a very useful task, it is much less addressed than video captioning. We address video title generation for the first time by proposing two methods that extend state-of-the-art video captioners to this new task. First, we make video captioners highlight sensitive by priming them with a highlight detector. Our framework allows for jointly training a model for title generation and video highlight localization. Second, we induce high sentence diversity in video captioners, so that the generated titles are also diverse and catchy. This means that a large number of sentences might be required to learn the sentence structure of titles. Hence, we propose a novel sentence augmentation method to train a captioner with additional sentence-only examples that come without corresponding videos. On the other hand, for video question-answering task: we propose to learn a deep model to answer a free-form natural language question about the contents of a video. We make a program automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Next, we use these candidate QA pairs to train a number of video-based QA methods extended from MN, VQA, SA, and SS. In order to handle non-perfect candidate QA pairs, we propose a self-paced learning procedure to iteratively identify them and mitigate their effects in training. To demonstrate our idea, we collected a large-scale Video Titles in the Wild (VTW) dataset of $18100$ automatically crawled user-generated videos and titles. We then utilize an automatic QA generator to generate a large number of QA pairs for training and collect the manually generated QA pairs from Amazon Mechanical Turk. On VTW, our methods consistently improve title prediction accuracy, and achieve the best performance in both automatic and human evaluation. Next, our sentence augmentation method also outperforms the baselines on the M-VAD dataset. Finally, the results of video question answering show that our self-paced learning procedure is effective, and the extended SS model outperforms various baselines.
Xu, Huijuan. "Vision and language understanding with localized evidence." Thesis, 2018. https://hdl.handle.net/2144/34790.
Full textBooks on the topic "Video question answering"
McCallum, Richard. Evangelical Christian Responses to Islam. Bloomsbury Publishing Plc, 2024. http://dx.doi.org/10.5040/9781350418240.
Full textWalker, Stephen. Digital Mediation. Bloomsbury Publishing Plc, 2024. http://dx.doi.org/10.5040/9781526525772.
Full textBook chapters on the topic "Video question answering"
Wu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. "Video Question Answering." In Visual Question Answering, 119–33. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1_8.
Full textWu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. "Video Representation Learning." In Visual Question Answering, 111–17. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1_7.
Full textWu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. "Advanced Models for Video Question Answering." In Visual Question Answering, 135–43. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1_9.
Full textGao, Lei, Guangda Li, Yan-Tao Zheng, Richang Hong, and Tat-Seng Chua. "Video Reference: A Video Question Answering Engine." In Lecture Notes in Computer Science, 799–801. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-11301-7_92.
Full textXiao, Junbin, Pan Zhou, Tat-Seng Chua, and Shuicheng Yan. "Video Graph Transformer for Video Question Answering." In Lecture Notes in Computer Science, 39–58. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20059-5_3.
Full textPiergiovanni, AJ, Kairo Morton, Weicheng Kuo, Michael S. Ryoo, and Anelia Angelova. "Video Question Answering with Iterative Video-Text Co-tokenization." In Lecture Notes in Computer Science, 76–94. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20059-5_5.
Full textChen, Xuanwei, Rui Liu, Xiaomeng Song, and Yahong Han. "Locating Visual Explanations for Video Question Answering." In MultiMedia Modeling, 290–302. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-67832-6_24.
Full textGupta, Pranay, and Manish Gupta. "NewsKVQA: Knowledge-Aware News Video Question Answering." In Advances in Knowledge Discovery and Data Mining, 3–15. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05981-0_1.
Full textGe, Yuanyuan, Youjiang Xu, and Yahong Han. "Video Question Answering Using a Forget Memory Network." In Communications in Computer and Information Science, 404–15. Singapore: Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-7299-4_33.
Full textGao, Kun, Xianglei Zhu, and Yahong Han. "Initialized Frame Attention Networks for Video Question Answering." In Communications in Computer and Information Science, 349–59. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-8530-7_34.
Full textConference papers on the topic "Video question answering"
Zhao, Wentian, Seokhwan Kim, Ning Xu, and Hailin Jin. "Video Question Answering on Screencast Tutorials." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/148.
Full textJenni, Kommineni, M. Srinivas, Roshni Sannapu, and Murukessan Perumal. "CSA-BERT: Video Question Answering." In 2023 IEEE Statistical Signal Processing Workshop (SSP). IEEE, 2023. http://dx.doi.org/10.1109/ssp53291.2023.10207954.
Full textLi, Hao, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, and Jie Chen. "TG-VQA: Ternary Game of Video Question Answering." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/116.
Full textZhao, Zhou, Qifan Yang, Deng Cai, Xiaofei He, and Yueting Zhuang. "Video Question Answering via Hierarchical Spatio-Temporal Attention Networks." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/492.
Full textChao, Guan-Lin, Abhinav Rastogi, Semih Yavuz, Dilek Hakkani-Tur, Jindong Chen, and Ian Lane. "Learning Question-Guided Video Representation for Multi-Turn Video Question Answering." In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/w19-5926.
Full textBhalerao, Mandar, Shlok Gujar, Aditya Bhave, and Anant V. Nimkar. "Visual Question Answering Using Video Clips." In 2019 IEEE Bombay Section Signature Conference (IBSSC). IEEE, 2019. http://dx.doi.org/10.1109/ibssc47189.2019.8973090.
Full textYang, Zekun, Noa Garcia, Chenhui Chu, Mayu Otani, Yuta Nakashima, and Haruo Takemura. "BERT Representations for Video Question Answering." In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2020. http://dx.doi.org/10.1109/wacv45572.2020.9093596.
Full textLi, Yicong, Xiang Wang, Junbin Xiao, Wei Ji, and Tat-Seng Chua. "Invariant Grounding for Video Question Answering." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00294.
Full textFang, Jiannan, Lingling Sun, and Yaqi Wang. "Video question answering by frame attention." In Eleventh International Conference on Digital Image Processing, edited by Xudong Jiang and Jenq-Neng Hwang. SPIE, 2019. http://dx.doi.org/10.1117/12.2539615.
Full textLei, Jie, Licheng Yu, Mohit Bansal, and Tamara Berg. "TVQA: Localized, Compositional Video Question Answering." In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/d18-1167.
Full text