Добірка наукової літератури з теми "Multimodal document understanding"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Multimodal document understanding".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Статті в журналах з теми "Multimodal document understanding"
Cho, Seongkuk, Jihoon Moon, Junhyeok Bae, Jiwon Kang, and Sangwook Lee. "A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach." Electronics 12, no. 4 (February 13, 2023): 939. http://dx.doi.org/10.3390/electronics12040939.
Повний текст джерелаMeskill, Carla, Jennifer Nilsen, and Alan Oliveira. "Intersections of Language, Content, and Multimodalities: Instructional Conversations in Mrs. B’s Sheltered English Biology Classroom." AERA Open 5, no. 2 (April 2019): 233285841985048. http://dx.doi.org/10.1177/2332858419850488.
Повний текст джерелаNugrahawati, Ana Wiyasa. "Teaching Religious Tolerance Through Critical and Evaluative Reading Course for English Language Education Students." ELE Reviews: English Language Education Reviews 3, no. 1 (May 31, 2023): 33–45. http://dx.doi.org/10.22515/elereviews.v3i1.6611.
Повний текст джерелаHalverson, Erica Rosenfeld. "Film as Identity Exploration: A Multimodal Analysis of Youth-Produced Films." Teachers College Record: The Voice of Scholarship in Education 112, no. 9 (September 2010): 2352–78. http://dx.doi.org/10.1177/016146811011200903.
Повний текст джерелаTroshchenkova, E. V., and E. A. Rudneva. "THE CONCEPT OF LEGAL DOCUMENT IN THE PROFESSIONAL SPHERE." Voprosy Kognitivnoy Lingvistiki, no. 1 (2023): 32–42. http://dx.doi.org/10.20916/1812-3228-2023-1-32-42.
Повний текст джерелаWang, Jiapeng, Chongyu Liu, Lianwen Jin, Guozhi Tang, Jiaxin Zhang, Shuaitao Zhang, Qianying Wang, Yaqiang Wu, and Mingxiang Cai. "Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 2738–45. http://dx.doi.org/10.1609/aaai.v35i4.16378.
Повний текст джерелаMaja, Inke Choirun Nisa’ Il, and Salim Nabhan. "Literacy in EFL Classroom: In-Service English Teachers’ Perceptions and Practices from Multiliteracies Perspective." JET ADI BUANA 7, no. 02 (October 31, 2022): 207–17. http://dx.doi.org/10.36456/jet.v7.n02.2022.7124.
Повний текст джерелаLiu, Susan I., Morgan Shikar, Emily Gante, Patricia Prufeta, Kaylee Ho, Philip S. Barie, Robert J. Winchell, and Jennifer I. Lee. "Improving Communication and Response to Clinical Deterioration to Increase Patient Safety in the Intensive Care Unit." Critical Care Nurse 42, no. 5 (October 1, 2022): 33–43. http://dx.doi.org/10.4037/ccn2022295.
Повний текст джерелаSarti, Aimee J., Stephanie Sutherland, Andrew Healey, Sonny Dhanani, Angele Landriault, Frances Fothergill-Bourbonnais, Michael Hartwick, Janice Beitel, Simon Oczkowski, and Pierre Cardinal. "A Multicenter Qualitative Investigation of the Experiences and Perspectives of Substitute Decision Makers Who Underwent Organ Donation Decisions." Progress in Transplantation 28, no. 4 (September 16, 2018): 343–48. http://dx.doi.org/10.1177/1526924818800046.
Повний текст джерелаRind, Esther, Klaus Kimpel, Christine Preiser, Falko Papenfuss, Anke Wagner, Karina Alsyte, Achim Siegel, et al. "Adjusting working conditions and evaluating the risk of infection during the COVID-19 pandemic in different workplace settings in Germany: a study protocol for an explorative modular mixed methods approach." BMJ Open 10, no. 11 (November 2020): e043908. http://dx.doi.org/10.1136/bmjopen-2020-043908.
Повний текст джерелаДисертації з теми "Multimodal document understanding"
Bakkali, Souhail. "Multimodal Document Understanding with Unified Vision and Language Cross-Modal Learning." Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS046.
Повний текст джерелаThe frameworks developed in this thesis were the outcome of an iterative process of analysis and synthesis between existing theories and our performed studies. More specifically, we wish to study cross-modality learning for contextualized comprehension on document components across language and vision. The main idea is to leverage multimodal information from document images into a common semantic space. This thesis focuses on advancing the research on cross-modality learning and makes contributions on four fronts: (i) to proposing a cross-modal approach with deep networks to jointly leverage visual and textual information into a common semantic representation space to automatically perform and make predictions about multimodal documents (i.e., the subject matter they are about); (ii) to investigating competitive strategies to address the tasks of cross-modal document classification, content-based retrieval and few-shot document classification; (iii) to addressing data-related issues like learning when data is not annotated, by proposing a network that learns generic representations from a collection of unlabeled documents; and (iv) to exploiting few-shot learning settings when data contains only few examples
Delecraz, Sébastien. "Approches jointes texte/image pour la compréhension multimodale de documents." Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0634/document.
Повний текст джерелаThe human faculties of understanding are essentially multimodal. To understand the world around them, human beings fuse the information coming from all of their sensory receptors. Most of the documents used in automatic information processing contain multimodal information, for example text and image in textual documents or image and sound in video documents, however the processings used are most often monomodal. The aim of this thesis is to propose joint processes applying mainly to text and image for the processing of multimodal documents through two studies: one on multimodal fusion for the speaker role recognition in television broadcasts, the other on the complementarity of modalities for a task of linguistic analysis on corpora of images with captions. In the first part of this study, we interested in audiovisual documents analysis from news television channels. We propose an approach that uses in particular deep neural networks for representation and fusion of modalities. In the second part of this thesis, we are interested in approaches allowing to use several sources of multimodal information for a monomodal task of natural language processing in order to study their complementarity. We propose a complete system of correction of prepositional attachments using visual information, trained on a multimodal corpus of images with captions
Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Повний текст джерелаIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Mangin, Olivier. "Emergence de concepts multimodaux : de la perception de mouvements primitifs à l'ancrage de mots acoustiques." Thesis, Bordeaux, 2014. http://www.theses.fr/2014BORD0002/document.
Повний текст джерелаThis thesis focuses on learning recurring patterns in multimodal perception. For that purpose it develops cognitive systems that model the mechanisms providing such capabilities to infants; a methodology that fits into thefield of developmental robotics.More precisely, this thesis revolves around two main topics that are, on the one hand the ability of infants or robots to imitate and understand human behaviors, and on the other the acquisition of language. At the crossing of these topics, we study the question of the how a developmental cognitive agent can discover a dictionary of primitive patterns from its multimodal perceptual flow. We specify this problem and formulate its links with Quine's indetermination of translation and blind source separation, as studied in acoustics.We sequentially study four sub-problems and provide an experimental formulation of each of them. We then describe and test computational models of agents solving these problems. They are particularly based on bag-of-words techniques, matrix factorization algorithms, and inverse reinforcement learning approaches. We first go in depth into the three separate problems of learning primitive sounds, such as phonemes or words, learning primitive dance motions, and learning primitive objective that compose complex tasks. Finally we study the problem of learning multimodal primitive patterns, which corresponds to solve simultaneously several of the aforementioned problems. We also details how the last problems models acoustic words grounding
Частини книг з теми "Multimodal document understanding"
Cooney, Ciaran, Rachel Heyburn, Liam Madigan, Mairead O’Cuinn, Chloe Thompson, and Joana Cavadas. "Unimodal and Multimodal Representation Training for Relation Extraction." In Communications in Computer and Information Science, 450–61. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_35.
Повний текст джерелаHarris, Teresa, and Miemsie Steyn. "Understanding Students’ Perspectives as Learners through Photovoice." In Academic Knowledge Construction and Multimodal Curriculum Development, 357–75. IGI Global, 2014. http://dx.doi.org/10.4018/978-1-4666-4797-8.ch022.
Повний текст джерелаEdge, Christi. "A Teacher Educator's Meaning-Making From a Hybrid “Online Teaching Fellows” Professional Learning Experience." In Handbook of Research on Virtual Training and Mentoring of Online Instructors, 76–109. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-6322-8.ch005.
Повний текст джерелаEdge, Christi. "A Teacher Educator's Meaning-Making From a Hybrid “Online Teaching Fellows” Professional Learning Experience." In Research Anthology on Facilitating New Educational Practices Through Communities of Learning, 422–55. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-7294-8.ch023.
Повний текст джерела"Understanding page-based media." In The Structure of Multimodal Documents, 10–34. Routledge, 2015. http://dx.doi.org/10.4324/9781315740454-2.
Повний текст джерелаG. Almeida-Návar, Saúl, Nexaí Reyes-Sampieri, Jose T. Morelos-Garcia, Jorge M. Antolinez-Motta, and Gabriel I. Herrejón-Galaviz. "Chronic Postoperative Pain." In Topics in Postoperative Pain [Working Title]. IntechOpen, 2023. http://dx.doi.org/10.5772/intechopen.111878.
Повний текст джерелаHai-Jew, Shalin. "Exploiting Enriched Knowledge of Web Network Structures." In Enhancing Qualitative and Mixed Methods Research with Technology, 255–86. IGI Global, 2015. http://dx.doi.org/10.4018/978-1-4666-6493-7.ch011.
Повний текст джерелаТези доповідей конференцій з теми "Multimodal document understanding"
Wang, Wenjin, Zhengjie Huang, Bin Luo, Qianglong Chen, Qiming Peng, Yinxu Pan, Weichong Yin, et al. "mmLayout: Multi-grained MultiModal Transformer for Document Understanding." In MM '22: The 30th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503161.3548406.
Повний текст джерелаGu, Zhangxuan, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu, and Liqing Zhang. "XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00454.
Повний текст джерелаWang, Zilong, Mingjie Zhan, Xuebo Liu, and Ding Liang. "DocStruct: A Multimodal Method to Extract Hierarchy Structure in Document for General Form Understanding." In Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.findings-emnlp.80.
Повний текст джерелаDang, Xuan-Hong, Syed Yousaf Shah, and Petros Zerfos. "``The Squawk Bot'': Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/634.
Повний текст джерела