Littérature scientifique sur le sujet « Multimodal document understanding »
Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres
Consultez les listes thématiques d’articles de revues, de livres, de thèses, de rapports de conférences et d’autres sources académiques sur le sujet « Multimodal document understanding ».
À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.
Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.
Articles de revues sur le sujet "Multimodal document understanding"
Cho, Seongkuk, Jihoon Moon, Junhyeok Bae, Jiwon Kang et Sangwook Lee. « A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach ». Electronics 12, no 4 (13 février 2023) : 939. http://dx.doi.org/10.3390/electronics12040939.
Texte intégralMeskill, Carla, Jennifer Nilsen et Alan Oliveira. « Intersections of Language, Content, and Multimodalities : Instructional Conversations in Mrs. B’s Sheltered English Biology Classroom ». AERA Open 5, no 2 (avril 2019) : 233285841985048. http://dx.doi.org/10.1177/2332858419850488.
Texte intégralNugrahawati, Ana Wiyasa. « Teaching Religious Tolerance Through Critical and Evaluative Reading Course for English Language Education Students ». ELE Reviews : English Language Education Reviews 3, no 1 (31 mai 2023) : 33–45. http://dx.doi.org/10.22515/elereviews.v3i1.6611.
Texte intégralHalverson, Erica Rosenfeld. « Film as Identity Exploration : A Multimodal Analysis of Youth-Produced Films ». Teachers College Record : The Voice of Scholarship in Education 112, no 9 (septembre 2010) : 2352–78. http://dx.doi.org/10.1177/016146811011200903.
Texte intégralTroshchenkova, E. V., et E. A. Rudneva. « THE CONCEPT OF LEGAL DOCUMENT IN THE PROFESSIONAL SPHERE ». Voprosy Kognitivnoy Lingvistiki, no 1 (2023) : 32–42. http://dx.doi.org/10.20916/1812-3228-2023-1-32-42.
Texte intégralWang, Jiapeng, Chongyu Liu, Lianwen Jin, Guozhi Tang, Jiaxin Zhang, Shuaitao Zhang, Qianying Wang, Yaqiang Wu et Mingxiang Cai. « Towards Robust Visual Information Extraction in Real World : New Dataset and Novel Solution ». Proceedings of the AAAI Conference on Artificial Intelligence 35, no 4 (18 mai 2021) : 2738–45. http://dx.doi.org/10.1609/aaai.v35i4.16378.
Texte intégralMaja, Inke Choirun Nisa’ Il, et Salim Nabhan. « Literacy in EFL Classroom : In-Service English Teachers’ Perceptions and Practices from Multiliteracies Perspective ». JET ADI BUANA 7, no 02 (31 octobre 2022) : 207–17. http://dx.doi.org/10.36456/jet.v7.n02.2022.7124.
Texte intégralLiu, Susan I., Morgan Shikar, Emily Gante, Patricia Prufeta, Kaylee Ho, Philip S. Barie, Robert J. Winchell et Jennifer I. Lee. « Improving Communication and Response to Clinical Deterioration to Increase Patient Safety in the Intensive Care Unit ». Critical Care Nurse 42, no 5 (1 octobre 2022) : 33–43. http://dx.doi.org/10.4037/ccn2022295.
Texte intégralSarti, Aimee J., Stephanie Sutherland, Andrew Healey, Sonny Dhanani, Angele Landriault, Frances Fothergill-Bourbonnais, Michael Hartwick, Janice Beitel, Simon Oczkowski et Pierre Cardinal. « A Multicenter Qualitative Investigation of the Experiences and Perspectives of Substitute Decision Makers Who Underwent Organ Donation Decisions ». Progress in Transplantation 28, no 4 (16 septembre 2018) : 343–48. http://dx.doi.org/10.1177/1526924818800046.
Texte intégralRind, Esther, Klaus Kimpel, Christine Preiser, Falko Papenfuss, Anke Wagner, Karina Alsyte, Achim Siegel et al. « Adjusting working conditions and evaluating the risk of infection during the COVID-19 pandemic in different workplace settings in Germany : a study protocol for an explorative modular mixed methods approach ». BMJ Open 10, no 11 (novembre 2020) : e043908. http://dx.doi.org/10.1136/bmjopen-2020-043908.
Texte intégralThèses sur le sujet "Multimodal document understanding"
Bakkali, Souhail. « Multimodal Document Understanding with Unified Vision and Language Cross-Modal Learning ». Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS046.
Texte intégralThe frameworks developed in this thesis were the outcome of an iterative process of analysis and synthesis between existing theories and our performed studies. More specifically, we wish to study cross-modality learning for contextualized comprehension on document components across language and vision. The main idea is to leverage multimodal information from document images into a common semantic space. This thesis focuses on advancing the research on cross-modality learning and makes contributions on four fronts: (i) to proposing a cross-modal approach with deep networks to jointly leverage visual and textual information into a common semantic representation space to automatically perform and make predictions about multimodal documents (i.e., the subject matter they are about); (ii) to investigating competitive strategies to address the tasks of cross-modal document classification, content-based retrieval and few-shot document classification; (iii) to addressing data-related issues like learning when data is not annotated, by proposing a network that learns generic representations from a collection of unlabeled documents; and (iv) to exploiting few-shot learning settings when data contains only few examples
Delecraz, Sébastien. « Approches jointes texte/image pour la compréhension multimodale de documents ». Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0634/document.
Texte intégralThe human faculties of understanding are essentially multimodal. To understand the world around them, human beings fuse the information coming from all of their sensory receptors. Most of the documents used in automatic information processing contain multimodal information, for example text and image in textual documents or image and sound in video documents, however the processings used are most often monomodal. The aim of this thesis is to propose joint processes applying mainly to text and image for the processing of multimodal documents through two studies: one on multimodal fusion for the speaker role recognition in television broadcasts, the other on the complementarity of modalities for a task of linguistic analysis on corpora of images with captions. In the first part of this study, we interested in audiovisual documents analysis from news television channels. We propose an approach that uses in particular deep neural networks for representation and fusion of modalities. In the second part of this thesis, we are interested in approaches allowing to use several sources of multimodal information for a monomodal task of natural language processing in order to study their complementarity. We propose a complete system of correction of prepositional attachments using visual information, trained on a multimodal corpus of images with captions
Vukotic, Verdran. « Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data ». Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Texte intégralIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Mangin, Olivier. « Emergence de concepts multimodaux : de la perception de mouvements primitifs à l'ancrage de mots acoustiques ». Thesis, Bordeaux, 2014. http://www.theses.fr/2014BORD0002/document.
Texte intégralThis thesis focuses on learning recurring patterns in multimodal perception. For that purpose it develops cognitive systems that model the mechanisms providing such capabilities to infants; a methodology that fits into thefield of developmental robotics.More precisely, this thesis revolves around two main topics that are, on the one hand the ability of infants or robots to imitate and understand human behaviors, and on the other the acquisition of language. At the crossing of these topics, we study the question of the how a developmental cognitive agent can discover a dictionary of primitive patterns from its multimodal perceptual flow. We specify this problem and formulate its links with Quine's indetermination of translation and blind source separation, as studied in acoustics.We sequentially study four sub-problems and provide an experimental formulation of each of them. We then describe and test computational models of agents solving these problems. They are particularly based on bag-of-words techniques, matrix factorization algorithms, and inverse reinforcement learning approaches. We first go in depth into the three separate problems of learning primitive sounds, such as phonemes or words, learning primitive dance motions, and learning primitive objective that compose complex tasks. Finally we study the problem of learning multimodal primitive patterns, which corresponds to solve simultaneously several of the aforementioned problems. We also details how the last problems models acoustic words grounding
Chapitres de livres sur le sujet "Multimodal document understanding"
Cooney, Ciaran, Rachel Heyburn, Liam Madigan, Mairead O’Cuinn, Chloe Thompson et Joana Cavadas. « Unimodal and Multimodal Representation Training for Relation Extraction ». Dans Communications in Computer and Information Science, 450–61. Cham : Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_35.
Texte intégralHarris, Teresa, et Miemsie Steyn. « Understanding Students’ Perspectives as Learners through Photovoice ». Dans Academic Knowledge Construction and Multimodal Curriculum Development, 357–75. IGI Global, 2014. http://dx.doi.org/10.4018/978-1-4666-4797-8.ch022.
Texte intégralEdge, Christi. « A Teacher Educator's Meaning-Making From a Hybrid “Online Teaching Fellows” Professional Learning Experience ». Dans Handbook of Research on Virtual Training and Mentoring of Online Instructors, 76–109. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-6322-8.ch005.
Texte intégralEdge, Christi. « A Teacher Educator's Meaning-Making From a Hybrid “Online Teaching Fellows” Professional Learning Experience ». Dans Research Anthology on Facilitating New Educational Practices Through Communities of Learning, 422–55. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-7294-8.ch023.
Texte intégral« Understanding page-based media ». Dans The Structure of Multimodal Documents, 10–34. Routledge, 2015. http://dx.doi.org/10.4324/9781315740454-2.
Texte intégralG. Almeida-Návar, Saúl, Nexaí Reyes-Sampieri, Jose T. Morelos-Garcia, Jorge M. Antolinez-Motta et Gabriel I. Herrejón-Galaviz. « Chronic Postoperative Pain ». Dans Topics in Postoperative Pain [Working Title]. IntechOpen, 2023. http://dx.doi.org/10.5772/intechopen.111878.
Texte intégralHai-Jew, Shalin. « Exploiting Enriched Knowledge of Web Network Structures ». Dans Enhancing Qualitative and Mixed Methods Research with Technology, 255–86. IGI Global, 2015. http://dx.doi.org/10.4018/978-1-4666-6493-7.ch011.
Texte intégralActes de conférences sur le sujet "Multimodal document understanding"
Wang, Wenjin, Zhengjie Huang, Bin Luo, Qianglong Chen, Qiming Peng, Yinxu Pan, Weichong Yin et al. « mmLayout : Multi-grained MultiModal Transformer for Document Understanding ». Dans MM '22 : The 30th ACM International Conference on Multimedia. New York, NY, USA : ACM, 2022. http://dx.doi.org/10.1145/3503161.3548406.
Texte intégralGu, Zhangxuan, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu et Liqing Zhang. « XYLayoutLM : Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding ». Dans 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00454.
Texte intégralWang, Zilong, Mingjie Zhan, Xuebo Liu et Ding Liang. « DocStruct : A Multimodal Method to Extract Hierarchy Structure in Document for General Form Understanding ». Dans Findings of the Association for Computational Linguistics : EMNLP 2020. Stroudsburg, PA, USA : Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.findings-emnlp.80.
Texte intégralDang, Xuan-Hong, Syed Yousaf Shah et Petros Zerfos. « ``The Squawk Bot'' : Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering ». Dans Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California : International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/634.
Texte intégral