Gotowa bibliografia na temat „Multimodal document understanding”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Multimodal document understanding”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Artykuły w czasopismach na temat "Multimodal document understanding"
Cho, Seongkuk, Jihoon Moon, Junhyeok Bae, Jiwon Kang i Sangwook Lee. "A Framework for Understanding Unstructured Financial Documents Using RPA and Multimodal Approach". Electronics 12, nr 4 (13.02.2023): 939. http://dx.doi.org/10.3390/electronics12040939.
Pełny tekst źródłaMeskill, Carla, Jennifer Nilsen i Alan Oliveira. "Intersections of Language, Content, and Multimodalities: Instructional Conversations in Mrs. B’s Sheltered English Biology Classroom". AERA Open 5, nr 2 (kwiecień 2019): 233285841985048. http://dx.doi.org/10.1177/2332858419850488.
Pełny tekst źródłaNugrahawati, Ana Wiyasa. "Teaching Religious Tolerance Through Critical and Evaluative Reading Course for English Language Education Students". ELE Reviews: English Language Education Reviews 3, nr 1 (31.05.2023): 33–45. http://dx.doi.org/10.22515/elereviews.v3i1.6611.
Pełny tekst źródłaHalverson, Erica Rosenfeld. "Film as Identity Exploration: A Multimodal Analysis of Youth-Produced Films". Teachers College Record: The Voice of Scholarship in Education 112, nr 9 (wrzesień 2010): 2352–78. http://dx.doi.org/10.1177/016146811011200903.
Pełny tekst źródłaTroshchenkova, E. V., i E. A. Rudneva. "THE CONCEPT OF LEGAL DOCUMENT IN THE PROFESSIONAL SPHERE". Voprosy Kognitivnoy Lingvistiki, nr 1 (2023): 32–42. http://dx.doi.org/10.20916/1812-3228-2023-1-32-42.
Pełny tekst źródłaWang, Jiapeng, Chongyu Liu, Lianwen Jin, Guozhi Tang, Jiaxin Zhang, Shuaitao Zhang, Qianying Wang, Yaqiang Wu i Mingxiang Cai. "Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 4 (18.05.2021): 2738–45. http://dx.doi.org/10.1609/aaai.v35i4.16378.
Pełny tekst źródłaMaja, Inke Choirun Nisa’ Il, i Salim Nabhan. "Literacy in EFL Classroom: In-Service English Teachers’ Perceptions and Practices from Multiliteracies Perspective". JET ADI BUANA 7, nr 02 (31.10.2022): 207–17. http://dx.doi.org/10.36456/jet.v7.n02.2022.7124.
Pełny tekst źródłaLiu, Susan I., Morgan Shikar, Emily Gante, Patricia Prufeta, Kaylee Ho, Philip S. Barie, Robert J. Winchell i Jennifer I. Lee. "Improving Communication and Response to Clinical Deterioration to Increase Patient Safety in the Intensive Care Unit". Critical Care Nurse 42, nr 5 (1.10.2022): 33–43. http://dx.doi.org/10.4037/ccn2022295.
Pełny tekst źródłaSarti, Aimee J., Stephanie Sutherland, Andrew Healey, Sonny Dhanani, Angele Landriault, Frances Fothergill-Bourbonnais, Michael Hartwick, Janice Beitel, Simon Oczkowski i Pierre Cardinal. "A Multicenter Qualitative Investigation of the Experiences and Perspectives of Substitute Decision Makers Who Underwent Organ Donation Decisions". Progress in Transplantation 28, nr 4 (16.09.2018): 343–48. http://dx.doi.org/10.1177/1526924818800046.
Pełny tekst źródłaRind, Esther, Klaus Kimpel, Christine Preiser, Falko Papenfuss, Anke Wagner, Karina Alsyte, Achim Siegel i in. "Adjusting working conditions and evaluating the risk of infection during the COVID-19 pandemic in different workplace settings in Germany: a study protocol for an explorative modular mixed methods approach". BMJ Open 10, nr 11 (listopad 2020): e043908. http://dx.doi.org/10.1136/bmjopen-2020-043908.
Pełny tekst źródłaRozprawy doktorskie na temat "Multimodal document understanding"
Bakkali, Souhail. "Multimodal Document Understanding with Unified Vision and Language Cross-Modal Learning". Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS046.
Pełny tekst źródłaThe frameworks developed in this thesis were the outcome of an iterative process of analysis and synthesis between existing theories and our performed studies. More specifically, we wish to study cross-modality learning for contextualized comprehension on document components across language and vision. The main idea is to leverage multimodal information from document images into a common semantic space. This thesis focuses on advancing the research on cross-modality learning and makes contributions on four fronts: (i) to proposing a cross-modal approach with deep networks to jointly leverage visual and textual information into a common semantic representation space to automatically perform and make predictions about multimodal documents (i.e., the subject matter they are about); (ii) to investigating competitive strategies to address the tasks of cross-modal document classification, content-based retrieval and few-shot document classification; (iii) to addressing data-related issues like learning when data is not annotated, by proposing a network that learns generic representations from a collection of unlabeled documents; and (iv) to exploiting few-shot learning settings when data contains only few examples
Delecraz, Sébastien. "Approches jointes texte/image pour la compréhension multimodale de documents". Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0634/document.
Pełny tekst źródłaThe human faculties of understanding are essentially multimodal. To understand the world around them, human beings fuse the information coming from all of their sensory receptors. Most of the documents used in automatic information processing contain multimodal information, for example text and image in textual documents or image and sound in video documents, however the processings used are most often monomodal. The aim of this thesis is to propose joint processes applying mainly to text and image for the processing of multimodal documents through two studies: one on multimodal fusion for the speaker role recognition in television broadcasts, the other on the complementarity of modalities for a task of linguistic analysis on corpora of images with captions. In the first part of this study, we interested in audiovisual documents analysis from news television channels. We propose an approach that uses in particular deep neural networks for representation and fusion of modalities. In the second part of this thesis, we are interested in approaches allowing to use several sources of multimodal information for a monomodal task of natural language processing in order to study their complementarity. We propose a complete system of correction of prepositional attachments using visual information, trained on a multimodal corpus of images with captions
Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data". Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Pełny tekst źródłaIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Mangin, Olivier. "Emergence de concepts multimodaux : de la perception de mouvements primitifs à l'ancrage de mots acoustiques". Thesis, Bordeaux, 2014. http://www.theses.fr/2014BORD0002/document.
Pełny tekst źródłaThis thesis focuses on learning recurring patterns in multimodal perception. For that purpose it develops cognitive systems that model the mechanisms providing such capabilities to infants; a methodology that fits into thefield of developmental robotics.More precisely, this thesis revolves around two main topics that are, on the one hand the ability of infants or robots to imitate and understand human behaviors, and on the other the acquisition of language. At the crossing of these topics, we study the question of the how a developmental cognitive agent can discover a dictionary of primitive patterns from its multimodal perceptual flow. We specify this problem and formulate its links with Quine's indetermination of translation and blind source separation, as studied in acoustics.We sequentially study four sub-problems and provide an experimental formulation of each of them. We then describe and test computational models of agents solving these problems. They are particularly based on bag-of-words techniques, matrix factorization algorithms, and inverse reinforcement learning approaches. We first go in depth into the three separate problems of learning primitive sounds, such as phonemes or words, learning primitive dance motions, and learning primitive objective that compose complex tasks. Finally we study the problem of learning multimodal primitive patterns, which corresponds to solve simultaneously several of the aforementioned problems. We also details how the last problems models acoustic words grounding
Części książek na temat "Multimodal document understanding"
Cooney, Ciaran, Rachel Heyburn, Liam Madigan, Mairead O’Cuinn, Chloe Thompson i Joana Cavadas. "Unimodal and Multimodal Representation Training for Relation Extraction". W Communications in Computer and Information Science, 450–61. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_35.
Pełny tekst źródłaHarris, Teresa, i Miemsie Steyn. "Understanding Students’ Perspectives as Learners through Photovoice". W Academic Knowledge Construction and Multimodal Curriculum Development, 357–75. IGI Global, 2014. http://dx.doi.org/10.4018/978-1-4666-4797-8.ch022.
Pełny tekst źródłaEdge, Christi. "A Teacher Educator's Meaning-Making From a Hybrid “Online Teaching Fellows” Professional Learning Experience". W Handbook of Research on Virtual Training and Mentoring of Online Instructors, 76–109. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-6322-8.ch005.
Pełny tekst źródłaEdge, Christi. "A Teacher Educator's Meaning-Making From a Hybrid “Online Teaching Fellows” Professional Learning Experience". W Research Anthology on Facilitating New Educational Practices Through Communities of Learning, 422–55. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-7294-8.ch023.
Pełny tekst źródła"Understanding page-based media". W The Structure of Multimodal Documents, 10–34. Routledge, 2015. http://dx.doi.org/10.4324/9781315740454-2.
Pełny tekst źródłaG. Almeida-Návar, Saúl, Nexaí Reyes-Sampieri, Jose T. Morelos-Garcia, Jorge M. Antolinez-Motta i Gabriel I. Herrejón-Galaviz. "Chronic Postoperative Pain". W Topics in Postoperative Pain [Working Title]. IntechOpen, 2023. http://dx.doi.org/10.5772/intechopen.111878.
Pełny tekst źródłaHai-Jew, Shalin. "Exploiting Enriched Knowledge of Web Network Structures". W Enhancing Qualitative and Mixed Methods Research with Technology, 255–86. IGI Global, 2015. http://dx.doi.org/10.4018/978-1-4666-6493-7.ch011.
Pełny tekst źródłaStreszczenia konferencji na temat "Multimodal document understanding"
Wang, Wenjin, Zhengjie Huang, Bin Luo, Qianglong Chen, Qiming Peng, Yinxu Pan, Weichong Yin i in. "mmLayout: Multi-grained MultiModal Transformer for Document Understanding". W MM '22: The 30th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503161.3548406.
Pełny tekst źródłaGu, Zhangxuan, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu i Liqing Zhang. "XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding". W 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00454.
Pełny tekst źródłaWang, Zilong, Mingjie Zhan, Xuebo Liu i Ding Liang. "DocStruct: A Multimodal Method to Extract Hierarchy Structure in Document for General Form Understanding". W Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.findings-emnlp.80.
Pełny tekst źródłaDang, Xuan-Hong, Syed Yousaf Shah i Petros Zerfos. "``The Squawk Bot'': Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering". W Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/634.
Pełny tekst źródła