Academic literature on the topic 'Visual grounding of text'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Visual grounding of text.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Visual grounding of text"
Chao Wang, Chao Wang, Wei Luo Chao Wang, Jia-Rui Zhu Wei Luo, Ying-Chun Xia Jia-Rui Zhu, Jin He Ying-Chun Xia, and Li-Chuan Gu Jin He. "End-to-end Visual Grounding Based on Query Text Guidance and Multi-stage Reasoning." 電腦學刊 35, no. 1 (February 2024): 083–95. http://dx.doi.org/10.53106/199115992024023501006.
Full textRegneri, Michaela, Marcus Rohrbach, Dominikus Wetzel, Stefan Thater, Bernt Schiele, and Manfred Pinkal. "Grounding Action Descriptions in Videos." Transactions of the Association for Computational Linguistics 1 (December 2013): 25–36. http://dx.doi.org/10.1162/tacl_a_00207.
Full textZhan, Yang, Yuan Yuan, and Zhitong Xiong. "Mono3DVG: 3D Visual Grounding in Monocular Images." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 6988–96. http://dx.doi.org/10.1609/aaai.v38i7.28525.
Full textZhang, Qianjun, and Jin Yuan. "Semantic-Aligned Cross-Modal Visual Grounding Network with Transformers." Applied Sciences 13, no. 9 (May 4, 2023): 5649. http://dx.doi.org/10.3390/app13095649.
Full textShen, Haozhan, Tiancheng Zhao, Mingwei Zhu, and Jianwei Yin. "GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (March 24, 2024): 4766–75. http://dx.doi.org/10.1609/aaai.v38i5.28278.
Full textLiu, Shilong, Shijia Huang, Feng Li, Hao Zhang, Yaoyuan Liang, Hang Su, Jun Zhu, and Lei Zhang. "DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (June 26, 2023): 1728–36. http://dx.doi.org/10.1609/aaai.v37i2.25261.
Full textCheng, Zesen, Kehan Li, Peng Jin, Siheng Li, Xiangyang Ji, Li Yuan, Chang Liu, and Jie Chen. "Parallel Vertex Diffusion for Unified Visual Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 2 (March 24, 2024): 1326–34. http://dx.doi.org/10.1609/aaai.v38i2.27896.
Full textFeng, Steven Y., Kevin Lu, Zhuofu Tao, Malihe Alikhani, Teruko Mitamura, Eduard Hovy, and Varun Gangal. "Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10618–26. http://dx.doi.org/10.1609/aaai.v36i10.21306.
Full textJia, Meihuizi, Lei Shen, Xin Shen, Lejian Liao, Meng Chen, Xiaodong He, Zhendong Chen, and Jiaqi Li. "MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (June 26, 2023): 8032–40. http://dx.doi.org/10.1609/aaai.v37i7.25971.
Full textShi, Zhan, Yilin Shen, Hongxia Jin, and Xiaodan Zhu. "Improving Zero-Shot Phrase Grounding via Reasoning on External Knowledge and Spatial Relations." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2253–61. http://dx.doi.org/10.1609/aaai.v36i2.20123.
Full textDissertations / Theses on the topic "Visual grounding of text"
Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.
Full textNowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric
Emmott, Stephen J. "The visual processing of text." Thesis, University of Stirling, 1993. http://hdl.handle.net/1893/1837.
Full textMi, Jinpeng Verfasser], and Jianwei [Akademischer Betreuer] [Zhang. "Natural Language Visual Grounding via Multimodal Learning / Jinpeng Mi ; Betreuer: Jianwei Zhang." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2020. http://d-nb.info/1205070885/34.
Full textMi, Jinpeng [Verfasser], and Jianwei [Akademischer Betreuer] Zhang. "Natural Language Visual Grounding via Multimodal Learning / Jinpeng Mi ; Betreuer: Jianwei Zhang." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2020. http://d-nb.info/1205070885/34.
Full textPrince, Md Enamul Hoque. "Visual text analytics for online conversations." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/61772.
Full textScience, Faculty of
Computer Science, Department of
Graduate
Chauhan, Aneesh. "Grounding human vocabulary in robot perception through interaction." Doctoral thesis, Universidade de Aveiro, 2014. http://hdl.handle.net/10773/12841.
Full textThis thesis addresses the problem of word learning in computational agents. The motivation behind this work lies in the need to support language-based communication between service robots and their human users, as well as grounded reasoning using symbols relevant for the assigned tasks. The research focuses on the problem of grounding human vocabulary in robotic agent’s sensori-motor perception. Words have to be grounded in bodily experiences, which emphasizes the role of appropriate embodiments. On the other hand, language is a cultural product created and acquired through social interactions. This emphasizes the role of society as a source of linguistic input. Taking these aspects into account, an experimental scenario is set up where a human instructor teaches a robotic agent the names of the objects present in a visually shared environment. The agent grounds the names of these objects in visual perception. Word learning is an open-ended problem. Therefore, the learning architecture of the agent will have to be able to acquire words and categories in an openended manner. In this work, four learning architectures were designed that can be used by robotic agents for long-term and open-ended word and category acquisition. The learning methods used in these architectures are designed for incrementally scaling-up to larger sets of words and categories. A novel experimental evaluation methodology, that takes into account the openended nature of word learning, is proposed and applied. This methodology is based on the realization that a robot’s vocabulary will be limited by its discriminatory capacity which, in turn, depends on its sensors and perceptual capabilities. An extensive set of systematic experiments, in multiple experimental settings, was carried out to thoroughly evaluate the described learning approaches. The results indicate that all approaches were able to incrementally acquire new words and categories. Although some of the approaches could not scale-up to larger vocabularies, one approach was shown to learn up to 293 categories, with potential for learning many more.
Esta tese aborda o problema da aprendizagem de palavras em agentes computacionais. A motivação por trás deste trabalho reside na necessidade de suportar a comunicação baseada em linguagem entre os robôs de serviço e os seus utilizadores humanos, bem como suportar o raciocínio baseado em símbolos que sejam relevantes no contexto das tarefas atribuídas e cujo significado seja definido com base na experiência perceptiva. Mais especificamente, o foco da investigação é o problema de estabelecer o significado das palavras na percepção do robô através da interacção homemrobô. A definição do significado das palavras com base em experiências perceptuais e perceptuo-motoras enfatiza o papel da configuração física e perceptuomotora do robô. Entretanto, a língua é um produto cultural criado e adquirido através de interacções sociais. Isso destaca o papel da sociedade como fonte linguística. Tendo em conta estes aspectos, um cenário experimental foi definido no qual um instrutor humano ensina a um agente robótico os nomes dos objectos presentes num ambiente visualmente partilhado. O agente associa os nomes desses objectos à sua percepção visual desses objectos. A aprendizagem de palavras é um problema sem objectivo pré-estabelecido. Nós adquirimos novas palavras ao longo das nossas vidas. Assim, a arquitectura de aprendizagem do agente deve poder adquirir palavras e categorias de uma forma semelhante. Neste trabalho foram concebidas quatro arquitecturas de aprendizagem que podem ser usadas por agentes robóticos para aprendizagem e aquisição de novas palavras e categorias, incrementalmente. Os métodos de aprendizagem utilizados nestas arquitecturas foram projectados para funcionar de forma incremental, acumulando um conjunto cada vez maior de palavras e categorias. É proposta e aplicada uma nova metodologia da avaliação experimental que leva em conta a natureza aberta e incremental da aprendizagem de palavras. Esta metodologia leva em consideração a constatação de que o vocabulário de um robô será limitado pela sua capacidade de discriminação, a qual, por sua vez, depende dos seus sensores e capacidades perceptuais. Foi realizado um extenso conjunto de experiências sistemáticas em múltiplas situações experimentais, para avaliar cuidadosamente estas abordagens de aprendizagem. Os resultados indicam que todas as abordagens foram capazes de adquirir novas palavras e categorias incrementalmente. Embora em algumas das abordagens não tenha sido possível atingir vocabulários maiores, verificou-se que uma das abordagens conseguiu aprender até 293 categorias, com potencial para aprender muitas mais.
Sabir, Ahmed. "Enhancing scene text recognition with visual context information." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/670286.
Full textAquesta tesi aborda el problema de millorar els sistemes de reconeixement de text, que permeten detectar i reconèixer text en imatges no restringides (per exemple, un cartell al carrer, un anunci, una destinació d’autobús, etc.). L’objectiu és millorar el rendiment dels sistemes de visió existents explotant la informació semàntica derivada de la pròpia imatge. La idea principal és que conèixer el contingut de la imatge o el context visual en el que un text apareix, pot ajudar a decidir quines són les paraules correctes. Per exemple, el fet que una imatge mostri una cafeteria fa que sigui més probable que una paraula en un rètol es llegeixi com a Dunkin que no pas com unkind. Abordem aquest problema recorrent a avenços en el processament del llenguatge natural i l’aprenentatge automàtic, en particular, aprenent re-rankers i xarxes neuronals, per presentar solucions de postprocés que milloren els sistemes de l’estat de l’art de reconeixement de text, sense necessitat de costosos procediments de reentrenament o afinació que requereixin grans quantitats de dades. Descobrir el grau de relació semàntica entre les paraules candidates i el seu context d’imatge és una tasca relacionada amb l’avaluació de la semblança semàntica entre paraules o fragments de text. Tanmateix, determinar l’existència d’una relació semàntica és una tasca més general que avaluar la semblança (per exemple, cotxe, carretera i semàfor estan relacionats però no són similars) i per tant els mètodes existents requereixen certes adaptacions. Per satisfer els requisits d’aquestes perspectives més àmplies de relació semàntica, desenvolupem dos enfocaments per aprendre la relació semàntica de la paraula reconeguda i el seu context: paraula-a-paraula (amb els objectes a la imatge) o paraula-a-frase (subtítol de la imatge). En l’enfocament de paraula-a-paraula s’usen re-rankers basats en word-embeddings. El re-ranker pren les paraules proposades pel sistema base i les torna a reordenar en funció del context visual proporcionat pel classificador d’objectes. Per al segon cas, s’ha dissenyat un enfocament neuronal d’extrem a extrem per explotar la descripció de la imatge (subtítol) tant a nivell de frase com a nivell de paraula i re-ordenar les paraules candidates basant-se tant en el context visual com en les co-ocurrències amb el subtítol. Com a contribució addicional, per satisfer els requisits dels enfocs basats en dades com ara les xarxes neuronals, presentem un conjunt de dades de contextos visuals per a aquesta tasca, en el què el conjunt de dades COCO-text disponible públicament [Veit et al. 2016] s’ha ampliat amb informació sobre l’escena (inclosos els objectes i els llocs que apareixen a la imatge) per permetre als investigadors incloure les relacions semàntiques entre textos i escena als seus sistemes de reconeixement de text, i oferir una base d’avaluació comuna per a aquests enfocaments.
Willems, Heather Marie. "Writing the written: text as a visual image." The Ohio State University, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=osu1382952227.
Full textKan, Jichao. "Visual-Text Translation with Deep Graph Neural Networks." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/23759.
Full textShmueli, Yael. "Integrating speech and visual text in multimodal interfaces." Thesis, University College London (University of London), 2005. http://discovery.ucl.ac.uk/1446688/.
Full textBooks on the topic "Visual grounding of text"
Jessica, Wyman, ed. Pro forma: Language, text, visual art. Toronto, ON, Canada: YYZ Books, 2005.
Find full textStrassner, Erich. Text-Bild-Kommunikation - Bild-Text-Kommunikation. Tübingen: Niemeyer, 2001.
Find full textWolfgang, Harms, and Deutsche Forschungsgemeinschaft, eds. Text und Bild, Bild und Text: DFG-Symposion 1988. Stuttgart: J.B. Metzler, 1990.
Find full textText und Bild: Grundfragen der Beschreibung von Text-Bild-Kommunikationen aus sprachwissenschaftlicher Sicht. Tübingen: Narr, 1986.
Find full textLeidner, Jochen L. Toponym resolution in text: Annotation, evaluation and applications of spatial grounding of place names. Boca Raton: Dissertation.com, 2007.
Find full textK, Ranai, ed. Visual editing on unix. Singapore: World Scientific, 1989.
Find full text1948-, John Samuel G., and Institute of Asian Studies (Madras, India), eds. The Great penance at Māmallapuram: Deciphering a visual text. Chennai: Institute of Asian Studies, 2001.
Find full textThe Bible as visual culture: When text becomes image. Sheffield: Sheffield Phoenix Press, 2013.
Find full textV, Drake Michael, ed. The visual fields: Text and atlas of clinical perimetry. 6th ed. St. Louis: Mosby, 1990.
Find full textGail, Finney, ed. Visual culture in twentieth-century Germany: Text as spectacle. Bloomington, Ind: Indiana University Press, 2006.
Find full textBook chapters on the topic "Visual grounding of text"
Min, Seonwoo, Nokyung Park, Siwon Kim, Seunghyun Park, and Jinkyu Kim. "Grounding Visual Representations with Texts for Domain Generalization." In Lecture Notes in Computer Science, 37–53. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19836-6_3.
Full textHong, Tao, Ya Wang, Xingwu Sun, Xiaoqing Li, and Jinwen Ma. "CMMix: Cross-Modal Mix Augmentation Between Images and Texts for Visual Grounding." In Communications in Computer and Information Science, 471–82. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8148-9_37.
Full textHendricks, Lisa Anne, Ronghang Hu, Trevor Darrell, and Zeynep Akata. "Grounding Visual Explanations." In Computer Vision – ECCV 2018, 269–86. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01216-8_17.
Full textJohari, Kritika, Christopher Tay Zi Tong, Vigneshwaran Subbaraju, Jung-Jae Kim, and U.-Xuan Tan. "Gaze Assisted Visual Grounding." In Social Robotics, 191–202. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-90525-5_17.
Full textXiao, Junbin, Xindi Shang, Xun Yang, Sheng Tang, and Tat-Seng Chua. "Visual Relation Grounding in Videos." In Computer Vision – ECCV 2020, 447–64. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58539-6_27.
Full textGoy, Anna. "Grounding Meaning in Visual Knowledge." In Spatial Language, 121–45. Dordrecht: Springer Netherlands, 2002. http://dx.doi.org/10.1007/978-94-015-9928-3_7.
Full textSilberer, Carina. "Grounding the Meaning of Words with Visual Attributes." In Visual Attributes, 331–62. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-50077-5_13.
Full textMazaheri, Amir, and Mubarak Shah. "Visual Text Correction." In Computer Vision – ECCV 2018, 159–75. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01261-8_10.
Full textWainer, Howard. "Integrating Figures and Text." In Visual Revelations, 143–45. New York, NY: Springer New York, 1997. http://dx.doi.org/10.1007/978-1-4612-2282-8_18.
Full textKittler, Josef, Mikhail Shevchenko, and David Windridge. "Visual Bootstrapping for Unsupervised Symbol Grounding." In Advanced Concepts for Intelligent Vision Systems, 1037–46. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11864349_94.
Full textConference papers on the topic "Visual grounding of text"
Zhang, Yimeng, Xin Chen, Jinghan Jia, Sijia Liu, and Ke Ding. "Text-Visual Prompting for Efficient 2D Temporal Video Grounding." In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023. http://dx.doi.org/10.1109/cvpr52729.2023.01421.
Full textWu, Yanmin, Xinhua Cheng, Renrui Zhang, Zesen Cheng, and Jian Zhang. "EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding." In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023. http://dx.doi.org/10.1109/cvpr52729.2023.01843.
Full textEndo, Ko, Masaki Aono, Eric Nichols, and Kotaro Funakoshi. "An Attention-based Regression Model for Grounding Textual Phrases in Images." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/558.
Full textConser, Erik, Kennedy Hahn, Chandler Watson, and Melanie Mitchell. "Revisiting Visual Grounding." In Proceedings of the Second Workshop on Shortcomings in Vision and Language. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/w19-1804.
Full textKim, Yongmin, Chenhui Chu, and Sadao Kurohashi. "Flexible Visual Grounding." In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.acl-srw.22.
Full textDu, Ye, Zehua Fu, Qingjie Liu, and Yunhong Wang. "Visual Grounding with Transformers." In 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022. http://dx.doi.org/10.1109/icme52920.2022.9859880.
Full textJing, Chenchen, Yuwei Wu, Mingtao Pei, Yao Hu, Yunde Jia, and Qi Wu. "Visual-Semantic Graph Matching for Visual Grounding." In MM '20: The 28th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3394171.3413902.
Full textDeng, Chaorui, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, and Mingkui Tan. "Visual Grounding via Accumulated Attention." In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018. http://dx.doi.org/10.1109/cvpr.2018.00808.
Full textLee, Jason, Kyunghyun Cho, and Douwe Kiela. "Countering Language Drift via Visual Grounding." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/d19-1447.
Full textSun, Yuxi, Shanshan Feng, Xutao Li, Yunming Ye, Jian Kang, and Xu Huang. "Visual Grounding in Remote Sensing Images." In MM '22: The 30th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503161.3548316.
Full textReports on the topic "Visual grounding of text"
Steed, Chad A., Christopher T. Symons, James K. Senter, and Frank A. DeNap. Guided Text Search Using Adaptive Visual Analytics. Office of Scientific and Technical Information (OSTI), October 2012. http://dx.doi.org/10.2172/1055105.
Full textBeiker, Sven, ed. Unsettled Issues Regarding Visual Communication Between Automated Vehicles and Other Road Users. SAE International, July 2021. http://dx.doi.org/10.4271/epr2021016.
Full textДирда, І. А., and З. П. Бакум. Linguodidactic fundamentals of the development of foreign students’ polycultural competence during the Ukrainian language training. Association 1901 "SEPIKE", 2016. http://dx.doi.org/10.31812/123456789/2994.
Full textБакум, З. П., and І. А. Дирда. Linguodidactic Fundamentals of the Development of Foreign Students' Polycultural Competence During the Ukrainian Language Training. Криворізький державний педагогічний університет, 2016. http://dx.doi.org/10.31812/0564/398.
Full textFigueredo, Luisa, Liliana Martinez, and Joao Paulo Almeida. Current role of Endoscopic Endonasal Approach for Craniopharyngiomas. A 10-year Systematic review and Meta-Analysis Comparison with the Open Transcranial Approach. INPLASY - International Platform of Registered Systematic Review and Meta-analysis Protocols, January 2023. http://dx.doi.org/10.37766/inplasy2023.1.0045.
Full textYatsymirska, Mariya. Мова війни і «контрнаступальна» лексика у стислих медійних текстах. Ivan Franko National University of Lviv, March 2023. http://dx.doi.org/10.30970/vjo.2023.52-53.11742.
Full textBaluk, Nadia, Natalia Basij, Larysa Buk, and Olha Vovchanska. VR/AR-TECHNOLOGIES – NEW CONTENT OF THE NEW MEDIA. Ivan Franko National University of Lviv, February 2021. http://dx.doi.org/10.30970/vjo.2021.49.11074.
Full textMakhachashvili, Rusudan K., Svetlana I. Kovpik, Anna O. Bakhtina, and Ekaterina O. Shmeltser. Technology of presentation of literature on the Emoji Maker platform: pedagogical function of graphic mimesis. [б. в.], July 2020. http://dx.doi.org/10.31812/123456789/3864.
Full textYatsymirska, Mariya. SOCIAL EXPRESSION IN MULTIMEDIA TEXTS. Ivan Franko National University of Lviv, February 2021. http://dx.doi.org/10.30970/vjo.2021.49.11072.
Full text