Academic literature on the topic 'Multimodal document understanding'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multimodal document understanding.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Multimodal document understanding"
Cho, Seongkuk, Jihoon Moon, Junhyeok Bae, Jiwon Kang, and Sangwook Lee. "A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach." Electronics 12, no. 4 (February 13, 2023): 939. http://dx.doi.org/10.3390/electronics12040939.
Full textMeskill, Carla, Jennifer Nilsen, and Alan Oliveira. "Intersections of Language, Content, and Multimodalities: Instructional Conversations in Mrs. B’s Sheltered English Biology Classroom." AERA Open 5, no. 2 (April 2019): 233285841985048. http://dx.doi.org/10.1177/2332858419850488.
Full textNugrahawati, Ana Wiyasa. "Teaching Religious Tolerance Through Critical and Evaluative Reading Course for English Language Education Students." ELE Reviews: English Language Education Reviews 3, no. 1 (May 31, 2023): 33–45. http://dx.doi.org/10.22515/elereviews.v3i1.6611.
Full textHalverson, Erica Rosenfeld. "Film as Identity Exploration: A Multimodal Analysis of Youth-Produced Films." Teachers College Record: The Voice of Scholarship in Education 112, no. 9 (September 2010): 2352–78. http://dx.doi.org/10.1177/016146811011200903.
Full textTroshchenkova, E. V., and E. A. Rudneva. "THE CONCEPT OF LEGAL DOCUMENT IN THE PROFESSIONAL SPHERE." Voprosy Kognitivnoy Lingvistiki, no. 1 (2023): 32–42. http://dx.doi.org/10.20916/1812-3228-2023-1-32-42.
Full textWang, Jiapeng, Chongyu Liu, Lianwen Jin, Guozhi Tang, Jiaxin Zhang, Shuaitao Zhang, Qianying Wang, Yaqiang Wu, and Mingxiang Cai. "Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 2738–45. http://dx.doi.org/10.1609/aaai.v35i4.16378.
Full textMaja, Inke Choirun Nisa’ Il, and Salim Nabhan. "Literacy in EFL Classroom: In-Service English Teachers’ Perceptions and Practices from Multiliteracies Perspective." JET ADI BUANA 7, no. 02 (October 31, 2022): 207–17. http://dx.doi.org/10.36456/jet.v7.n02.2022.7124.
Full textLiu, Susan I., Morgan Shikar, Emily Gante, Patricia Prufeta, Kaylee Ho, Philip S. Barie, Robert J. Winchell, and Jennifer I. Lee. "Improving Communication and Response to Clinical Deterioration to Increase Patient Safety in the Intensive Care Unit." Critical Care Nurse 42, no. 5 (October 1, 2022): 33–43. http://dx.doi.org/10.4037/ccn2022295.
Full textSarti, Aimee J., Stephanie Sutherland, Andrew Healey, Sonny Dhanani, Angele Landriault, Frances Fothergill-Bourbonnais, Michael Hartwick, Janice Beitel, Simon Oczkowski, and Pierre Cardinal. "A Multicenter Qualitative Investigation of the Experiences and Perspectives of Substitute Decision Makers Who Underwent Organ Donation Decisions." Progress in Transplantation 28, no. 4 (September 16, 2018): 343–48. http://dx.doi.org/10.1177/1526924818800046.
Full textRind, Esther, Klaus Kimpel, Christine Preiser, Falko Papenfuss, Anke Wagner, Karina Alsyte, Achim Siegel, et al. "Adjusting working conditions and evaluating the risk of infection during the COVID-19 pandemic in different workplace settings in Germany: a study protocol for an explorative modular mixed methods approach." BMJ Open 10, no. 11 (November 2020): e043908. http://dx.doi.org/10.1136/bmjopen-2020-043908.
Full textDissertations / Theses on the topic "Multimodal document understanding"
Bakkali, Souhail. "Multimodal Document Understanding with Unified Vision and Language Cross-Modal Learning." Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS046.
Full textThe frameworks developed in this thesis were the outcome of an iterative process of analysis and synthesis between existing theories and our performed studies. More specifically, we wish to study cross-modality learning for contextualized comprehension on document components across language and vision. The main idea is to leverage multimodal information from document images into a common semantic space. This thesis focuses on advancing the research on cross-modality learning and makes contributions on four fronts: (i) to proposing a cross-modal approach with deep networks to jointly leverage visual and textual information into a common semantic representation space to automatically perform and make predictions about multimodal documents (i.e., the subject matter they are about); (ii) to investigating competitive strategies to address the tasks of cross-modal document classification, content-based retrieval and few-shot document classification; (iii) to addressing data-related issues like learning when data is not annotated, by proposing a network that learns generic representations from a collection of unlabeled documents; and (iv) to exploiting few-shot learning settings when data contains only few examples
Delecraz, Sébastien. "Approches jointes texte/image pour la compréhension multimodale de documents." Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0634/document.
Full textThe human faculties of understanding are essentially multimodal. To understand the world around them, human beings fuse the information coming from all of their sensory receptors. Most of the documents used in automatic information processing contain multimodal information, for example text and image in textual documents or image and sound in video documents, however the processings used are most often monomodal. The aim of this thesis is to propose joint processes applying mainly to text and image for the processing of multimodal documents through two studies: one on multimodal fusion for the speaker role recognition in television broadcasts, the other on the complementarity of modalities for a task of linguistic analysis on corpora of images with captions. In the first part of this study, we interested in audiovisual documents analysis from news television channels. We propose an approach that uses in particular deep neural networks for representation and fusion of modalities. In the second part of this thesis, we are interested in approaches allowing to use several sources of multimodal information for a monomodal task of natural language processing in order to study their complementarity. We propose a complete system of correction of prepositional attachments using visual information, trained on a multimodal corpus of images with captions
Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Full textIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Mangin, Olivier. "Emergence de concepts multimodaux : de la perception de mouvements primitifs à l'ancrage de mots acoustiques." Thesis, Bordeaux, 2014. http://www.theses.fr/2014BORD0002/document.
Full textThis thesis focuses on learning recurring patterns in multimodal perception. For that purpose it develops cognitive systems that model the mechanisms providing such capabilities to infants; a methodology that fits into thefield of developmental robotics.More precisely, this thesis revolves around two main topics that are, on the one hand the ability of infants or robots to imitate and understand human behaviors, and on the other the acquisition of language. At the crossing of these topics, we study the question of the how a developmental cognitive agent can discover a dictionary of primitive patterns from its multimodal perceptual flow. We specify this problem and formulate its links with Quine's indetermination of translation and blind source separation, as studied in acoustics.We sequentially study four sub-problems and provide an experimental formulation of each of them. We then describe and test computational models of agents solving these problems. They are particularly based on bag-of-words techniques, matrix factorization algorithms, and inverse reinforcement learning approaches. We first go in depth into the three separate problems of learning primitive sounds, such as phonemes or words, learning primitive dance motions, and learning primitive objective that compose complex tasks. Finally we study the problem of learning multimodal primitive patterns, which corresponds to solve simultaneously several of the aforementioned problems. We also details how the last problems models acoustic words grounding
Book chapters on the topic "Multimodal document understanding"
Cooney, Ciaran, Rachel Heyburn, Liam Madigan, Mairead O’Cuinn, Chloe Thompson, and Joana Cavadas. "Unimodal and Multimodal Representation Training for Relation Extraction." In Communications in Computer and Information Science, 450–61. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_35.
Full textHarris, Teresa, and Miemsie Steyn. "Understanding Students’ Perspectives as Learners through Photovoice." In Academic Knowledge Construction and Multimodal Curriculum Development, 357–75. IGI Global, 2014. http://dx.doi.org/10.4018/978-1-4666-4797-8.ch022.
Full textEdge, Christi. "A Teacher Educator's Meaning-Making From a Hybrid “Online Teaching Fellows” Professional Learning Experience." In Handbook of Research on Virtual Training and Mentoring of Online Instructors, 76–109. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-6322-8.ch005.
Full textEdge, Christi. "A Teacher Educator's Meaning-Making From a Hybrid “Online Teaching Fellows” Professional Learning Experience." In Research Anthology on Facilitating New Educational Practices Through Communities of Learning, 422–55. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-7294-8.ch023.
Full text"Understanding page-based media." In The Structure of Multimodal Documents, 10–34. Routledge, 2015. http://dx.doi.org/10.4324/9781315740454-2.
Full textG. Almeida-Návar, Saúl, Nexaí Reyes-Sampieri, Jose T. Morelos-Garcia, Jorge M. Antolinez-Motta, and Gabriel I. Herrejón-Galaviz. "Chronic Postoperative Pain." In Topics in Postoperative Pain [Working Title]. IntechOpen, 2023. http://dx.doi.org/10.5772/intechopen.111878.
Full textHai-Jew, Shalin. "Exploiting Enriched Knowledge of Web Network Structures." In Enhancing Qualitative and Mixed Methods Research with Technology, 255–86. IGI Global, 2015. http://dx.doi.org/10.4018/978-1-4666-6493-7.ch011.
Full textConference papers on the topic "Multimodal document understanding"
Wang, Wenjin, Zhengjie Huang, Bin Luo, Qianglong Chen, Qiming Peng, Yinxu Pan, Weichong Yin, et al. "mmLayout: Multi-grained MultiModal Transformer for Document Understanding." In MM '22: The 30th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503161.3548406.
Full textGu, Zhangxuan, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu, and Liqing Zhang. "XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00454.
Full textWang, Zilong, Mingjie Zhan, Xuebo Liu, and Ding Liang. "DocStruct: A Multimodal Method to Extract Hierarchy Structure in Document for General Form Understanding." In Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.findings-emnlp.80.
Full textDang, Xuan-Hong, Syed Yousaf Shah, and Petros Zerfos. "``The Squawk Bot'': Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/634.
Full text