Добірка наукової літератури з теми "Multi-Modal representations"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Multi-Modal representations".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Статті в журналах з теми "Multi-Modal representations":
Wu, Lianlong, Seewon Choi, Daniel Raggi, Aaron Stockdill, Grecia Garcia Garcia, Fiorenzo Colarusso, Peter C. H. Cheng, and Mateja Jamnik. "Generation of Visual Representations for Multi-Modal Mathematical Knowledge." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (March 24, 2024): 23850–52. http://dx.doi.org/10.1609/aaai.v38i21.30586.
Zhang, Yi, Mingyuan Chen, Jundong Shen, and Chongjun Wang. "Tailor Versatile Multi-Modal Learning for Multi-Label Emotion Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 9100–9108. http://dx.doi.org/10.1609/aaai.v36i8.20895.
Zhang, Dong, Suzhong Wei, Shoushan Li, Hanqian Wu, Qiaoming Zhu, and Guodong Zhou. "Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (May 18, 2021): 14347–55. http://dx.doi.org/10.1609/aaai.v35i16.17687.
Liu, Hao, Jindong Han, Yanjie Fu, Jingbo Zhou, Xinjiang Lu, and Hui Xiong. "Multi-modal transportation recommendation with unified route representation learning." Proceedings of the VLDB Endowment 14, no. 3 (November 2020): 342–50. http://dx.doi.org/10.14778/3430915.3430924.
Wang, Huansha, Qinrang Liu, Ruiyang Huang, and Jianpeng Zhang. "Multi-Modal Entity Alignment Method Based on Feature Enhancement." Applied Sciences 13, no. 11 (June 1, 2023): 6747. http://dx.doi.org/10.3390/app13116747.
Wu, Tianxing, Chaoyu Gao, Lin Li, and Yuxiang Wang. "Leveraging Multi-Modal Information for Cross-Lingual Entity Matching across Knowledge Graphs." Applied Sciences 12, no. 19 (October 8, 2022): 10107. http://dx.doi.org/10.3390/app121910107.
Han, Ning, Jingjing Chen, Hao Zhang, Huanwen Wang, and Hao Chen. "Adversarial Multi-Grained Embedding Network for Cross-Modal Text-Video Retrieval." ACM Transactions on Multimedia Computing, Communications, and Applications 18, no. 2 (May 31, 2022): 1–23. http://dx.doi.org/10.1145/3483381.
Ying, Qichao, Xiaoxiao Hu, Yangming Zhou, Zhenxing Qian, Dan Zeng, and Shiming Ge. "Bootstrapping Multi-View Representations for Fake News Detection." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 4 (June 26, 2023): 5384–92. http://dx.doi.org/10.1609/aaai.v37i4.25670.
Huang, Yufeng, Jiji Tang, Zhuo Chen, Rongsheng Zhang, Xinfeng Zhang, Weijie Chen, Zeng Zhao, et al. "Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (March 24, 2024): 2417–25. http://dx.doi.org/10.1609/aaai.v38i3.28017.
van Tulder, Gijs, and Marleen de Bruijne. "Learning Cross-Modality Representations From Multi-Modal Images." IEEE Transactions on Medical Imaging 38, no. 2 (February 2019): 638–48. http://dx.doi.org/10.1109/tmi.2018.2868977.
Дисертації з теми "Multi-Modal representations":
Gu, Jian. "Multi-modal Neural Representations for Semantic Code Search." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279101.
Under de senaste decennierna har olika programvarusystem gradvis blivit basen i vårt samhälle. Programmerare söker i befintliga kodavsnitt från tid till annan i deras dagliga liv. Det skulle vara fördelaktigt och meningsfullt att ha bättre lösningar för uppgiften att semantisk kodsökning, vilket är att hitta de mest semantiskt relevanta kodavsnitten för en given fråga. Vår metod är att introducera trädrepresentationer genom multimodal inlärning. Grundidén är att berika semantisk information för kodavsnitt genom att förbereda data med olika modaliteter och samtidigt ignorera syntaktisk information. Vi designar en ny trädstruktur med namnet Simplified Semantic Tree och extraherar sedan RootPath-representationer från det. Vi använder RootPath-representation för att komplettera den konventionella sekvensrepresentationen, nämligen kodsekvensens symbolsekvens. Vår multimodala modell får kodfrågeställningar som inmatning och beräknar likhetspoäng som utgång efter den pseudo-siamesiska arkitekturen. För varje par, förutom den färdiga kodsekvensen och frågesekvensen, extrager vi en extra trädsekvens från Simplified Semantic Tree. Det finns tre kodare i vår modell, och de kodar respektive tre sekvenser som vektorer av samma längd. Sedan kombinerar vi kodvektorn med trädvektorn för en gemensam vektor, som fortfarande är av samma längd som den multimodala representationen för kodavsnittet. Vi introducerar tripletförlust för att säkerställa att vektorer av kod och fråga i samma par är nära det delade vektorn. Vi genomför experiment i ett storskaligt flerspråkigt korpus, med jämförelser av starka baslinjemodeller med specificerade prestandametriker. Bland baslinjemodellerna är den enklaste Neural Bag-of-Words-modellen med den mest tillfredsställande prestanda. Det indikerar att syntaktisk information sannolikt kommer att distrahera komplexa modeller från kritisk semantisk information. Resultaten visar att vår multimodala representationsmetod fungerar bättre eftersom den överträffar basmodellerna i de flesta fall. Nyckeln till vår multimodala modell är att den helt handlar om semantisk information, och den lär sig av data om flera modaliteter.
Liu, Yahui. "Exploring Multi-Domain and Multi-Modal Representations for Unsupervised Image-to-Image Translation." Doctoral thesis, Università degli studi di Trento, 2022. http://hdl.handle.net/11572/342634.
Song, Pingfan. "Multi-modal image processing via joint sparse representations induced by coupled dictionaries." Thesis, University College London (University of London), 2018. http://discovery.ucl.ac.uk/10061963/.
Suthana, Nanthia Ananda. "Investigating human medical temporal representations of episodic information a multi-modal approach /." Diss., Restricted to subscribing institutions, 2009. http://proquest.umi.com/pqdweb?did=1905692921&sid=1&Fmt=2&clientId=1564&RQT=309&VName=PQD.
Tran, Thi Quynh Nhi. "Robust and comprehensive joint image-text representations." Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1096/document.
This thesis investigates the joint modeling of visual and textual content of multimedia documents to address cross-modal problems. Such tasks require the ability to match information across modalities. A common representation space, obtained by eg Kernel Canonical Correlation Analysis, on which images and text can be both represented and directly compared is a generally adopted solution.Nevertheless, such a joint space still suffers from several deficiencies that may hinder the performance of cross-modal tasks. An important contribution of this thesis is therefore to identify two major limitations of such a space. The first limitation concerns information that is poorly represented on the common space yet very significant for a retrieval task. The second limitation consists in a separation between modalities on the common space, which leads to coarse cross-modal matching. To deal with the first limitation concerning poorly-represented data, we put forward a model which first identifies such information and then finds ways to combine it with data that is relatively well-represented on the joint space. Evaluations on emph{text illustration} tasks show that by appropriately identifying and taking such information into account, the results of cross-modal retrieval can be strongly improved. The major work in this thesis aims to cope with the separation between modalities on the joint space to enhance the performance of cross-modal tasks.We propose two representation methods for bi-modal or uni-modal documents that aggregate information from both the visual and textual modalities projected on the joint space. Specifically, for uni-modal documents we suggest a completion process relying on an auxiliary dataset to find the corresponding information in the absent modality and then use such information to build a final bi-modal representation for a uni-modal document. Evaluations show that our approaches achieve state-of-the-art results on several standard and challenging datasets for cross-modal retrieval or bi-modal and cross-modal classification
Tran, Thi Quynh Nhi. "Robust and comprehensive joint image-text representations." Electronic Thesis or Diss., Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1096.
This thesis investigates the joint modeling of visual and textual content of multimedia documents to address cross-modal problems. Such tasks require the ability to match information across modalities. A common representation space, obtained by eg Kernel Canonical Correlation Analysis, on which images and text can be both represented and directly compared is a generally adopted solution.Nevertheless, such a joint space still suffers from several deficiencies that may hinder the performance of cross-modal tasks. An important contribution of this thesis is therefore to identify two major limitations of such a space. The first limitation concerns information that is poorly represented on the common space yet very significant for a retrieval task. The second limitation consists in a separation between modalities on the common space, which leads to coarse cross-modal matching. To deal with the first limitation concerning poorly-represented data, we put forward a model which first identifies such information and then finds ways to combine it with data that is relatively well-represented on the joint space. Evaluations on emph{text illustration} tasks show that by appropriately identifying and taking such information into account, the results of cross-modal retrieval can be strongly improved. The major work in this thesis aims to cope with the separation between modalities on the joint space to enhance the performance of cross-modal tasks.We propose two representation methods for bi-modal or uni-modal documents that aggregate information from both the visual and textual modalities projected on the joint space. Specifically, for uni-modal documents we suggest a completion process relying on an auxiliary dataset to find the corresponding information in the absent modality and then use such information to build a final bi-modal representation for a uni-modal document. Evaluations show that our approaches achieve state-of-the-art results on several standard and challenging datasets for cross-modal retrieval or bi-modal and cross-modal classification
Ben-Younes, Hedi. "Multi-modal representation learning towards visual reasoning." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS173.
The quantity of images that populate the Internet is dramatically increasing. It becomes of critical importance to develop the technology for a precise and automatic understanding of visual contents. As image recognition systems are becoming more and more relevant, researchers in artificial intelligence now seek for the next generation vision systems that can perform high-level scene understanding. In this thesis, we are interested in Visual Question Answering (VQA), which consists in building models that answer any natural language question about any image. Because of its nature and complexity, VQA is often considered as a proxy for visual reasoning. Classically, VQA architectures are designed as trainable systems that are provided with images, questions about them and their answers. To tackle this problem, typical approaches involve modern Deep Learning (DL) techniques. In the first part, we focus on developping multi-modal fusion strategies to model the interactions between image and question representations. More specifically, we explore bilinear fusion models and exploit concepts from tensor analysis to provide tractable and expressive factorizations of parameters. These fusion mechanisms are studied under the widely used visual attention framework: the answer to the question is provided by focusing only on the relevant image regions. In the last part, we move away from the attention mechanism and build a more advanced scene understanding architecture where we consider objects and their spatial and semantic relations. All models are thoroughly experimentally evaluated on standard datasets and the results are competitive with the literature
Li, Lin. "Multi-scale spectral embedding representation registration (MSERg) for multi-modal imaging registration." Case Western Reserve University School of Graduate Studies / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=case1467902012.
Gay, Joanna. "Structural representation models for multi-modal image registration in biomedical applications." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-410820.
Aissa, Wafa. "Réseaux de modules neuronaux pour un raisonnement visuel compositionnel." Electronic Thesis or Diss., Paris, HESAM, 2023. http://www.theses.fr/2023HESAC033.
The context of this PhD thesis is compositional visual reasoning. When presented with an image and a question pair, our objective is to have neural networks models answer the question by following a reasoning chain defined by a program. We assess the model's reasoning ability through a Visual Question Answering (VQA) setup.Compositional VQA breaks down complex questions into modular easier sub-problems.These sub-problems include reasoning skills such as object and attribute detection, relation detection, logical operations, counting, and comparisons. Each sub-problem is assigned to a different module. This approach discourages shortcuts, demanding an explicit understanding of the problem. It also promotes transparency and explainability.Neural module networks (NMN) are used to enable compositional reasoning. The framework is based on a generator-executor framework, the generator learns the translation of the question to its function program. The executor instantiates a neural module network where each function is assigned to a specific module. We also design a neural modules catalog and define the function and the structure of each module. The training and evaluations are conducted using the pre-processed GQA dataset cite{gqa}, which includes natural language questions, functional programs representing the reasoning chain, images, and corresponding answers.The research contributions revolve around the establishment of an NMN framework for the VQA task.One primary contribution involves the integration of vision and language pre-trained (VLP) representations into modular VQA. This integration serves as a ``warm-start" mechanism for initializing the reasoning process.The experiments demonstrate that cross-modal vision and language representations outperform uni-modal ones. This utilization enables the capture of intricate relationships within each individual modality while also facilitating alignment between different modalities, consequently enhancing overall accuracy of our NMN.Moreover, we explore various training techniques to enhance the learning process and improve cost-efficiency. In addition to optimizing the modules within the reasoning chain to collaboratively produce accurate answers, we introduce a teacher-guidance approach to optimize the intermediate modules in the reasoning chain. This ensures that these modules perform their specific reasoning sub-tasks without taking shortcuts or compromising the reasoning process's integrity. We propose and implement several teacher-guidance techniques, one of which draws inspiration from the teacher-forcing method commonly used in sequential models. Comparative analyses demonstrate the advantages of our teacher-guidance approach for NMNs, as detailed in our paper [1].We also introduce a novel Curriculum Learning (CL) strategy tailored for NMNs to reorganize the training examples and define a start-small training strategy. We begin by learning simpler programs and progressively increase the complexity of the training programs. We use several difficulty criteria to define the CL approach. Our findings demonstrate that by selecting the appropriate CL method, we can significantly reduce the training cost and required training data, with only a limited impact on the final VQA accuracy. This significant contribution forms the core of our paper [2].[1] W. Aissa, M. Ferecatu, and M. Crucianu. Curriculum learning for compositional visual reasoning. In Proceedings of VISIGRAPP 2023, Volume 5: VISAPP, 2023.[2] W. Aissa, M. Ferecatu, and M. Crucianu. Multimodal representations for teacher-guidedcompositional visual reasoning. In Advanced Concepts for Intelligent Vision Systems, 21st International Conference (ACIVS 2023). Springer International Publishing, 2023.[3] D. A. Hudson and C. D. Manning. GQA: A new dataset for real-world visual reasoning and compositional question answering. 2019
Книги з теми "Multi-Modal representations":
Po, Ming Jack. Multi-scale Representations for Classification of Protein Crystal Images and Multi-Modal Registration of the Lung. [New York, N.Y.?]: [publisher not identified], 2015.
(Editor), Syed A. Ali, and Susan McRoy (Editor), eds. Representations for Multi-Modal Human-Computer Interaction: Papers from the Aaai Workshop (Technical Reports Vol. Ws-98-09). AAAI Press, 1998.
Case, Julialicia, Eric Freeze, and Salvatore Pane. Story Mode. Bloomsbury Publishing Plc, 2024. http://dx.doi.org/10.5040/9781350301405.
Частини книг з теми "Multi-Modal representations":
Wiesen, Aryeh, and Yaakov HaCohen-Kerner. "Overview of Uni-modal and Multi-modal Representations for Classification Tasks." In Natural Language Processing and Information Systems, 397–404. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91947-8_41.
Li, Cheng, Hui Sun, Zaiyi Liu, Meiyun Wang, Hairong Zheng, and Shanshan Wang. "Learning Cross-Modal Deep Representations for Multi-Modal MR Image Segmentation." In Lecture Notes in Computer Science, 57–65. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32245-8_7.
Luo, Xi, Chunjie Cao, and Longjuan Wang. "Multi-modal Universal Embedding Representations for Language Understanding." In Communications in Computer and Information Science, 103–19. Singapore: Springer Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0523-0_7.
Zhao, Xiang, Weixin Zeng, and Jiuyang Tang. "Multimodal Entity Alignment." In Entity Alignment, 229–47. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4250-3_9.
Bae, Inhwan, Jin-Hwi Park, and Hae-Gon Jeon. "Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction." In Lecture Notes in Computer Science, 270–89. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20047-2_16.
Florea, Filip, Alexandrina Rogozan, Eugen Barbu, Abdelaziz Bensrhair, and Stefan Darmoni. "MedIC at ImageCLEF 2006: Automatic Image Categorization and Annotation Using Combined Visual Representations." In Evaluation of Multilingual and Multi-modal Information Retrieval, 670–77. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. http://dx.doi.org/10.1007/978-3-540-74999-8_82.
Qin, Chen, Bibo Shi, Rui Liao, Tommaso Mansi, Daniel Rueckert, and Ali Kamen. "Unsupervised Deformable Registration for Multi-modal Images via Disentangled Representations." In Lecture Notes in Computer Science, 249–61. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-20351-1_19.
Ge, Hongkun, Guorong Wu, Li Wang, Yaozong Gao, and Dinggang Shen. "Hierarchical Multi-modal Image Registration by Learning Common Feature Representations." In Machine Learning in Medical Imaging, 203–11. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-24888-2_25.
Dorent, Reuben, Nazim Haouchine, Fryderyk Kogl, Samuel Joutard, Parikshit Juvekar, Erickson Torio, Alexandra J. Golby, et al. "Unified Brain MR-Ultrasound Synthesis Using Multi-modal Hierarchical Representations." In Lecture Notes in Computer Science, 448–58. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-43999-5_43.
Kasiri, Keyvan, Paul Fieguth, and David A. Clausi. "Structural Representations for Multi-modal Image Registration Based on Modified Entropy." In Lecture Notes in Computer Science, 82–89. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-20801-5_9.
Тези доповідей конференцій з теми "Multi-Modal representations":
Zolfaghari, Mohammadreza, Yi Zhu, Peter Gehler, and Thomas Brox. "CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations." In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021. http://dx.doi.org/10.1109/iccv48922.2021.00148.
Lee, O.-Joun, and Jin-Taek Kim. "Learning Multi-modal Representations of Narrative Multimedia." In RACS '20: International Conference on Research in Adaptive and Convergent Systems. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3400286.3418216.
Zhou, Xin, Hongyu Zhou, Yong Liu, Zhiwei Zeng, Chunyan Miao, Pengwei Wang, Yuan You, and Feijun Jiang. "Bootstrap Latent Representations for Multi-modal Recommendation." In WWW '23: The ACM Web Conference 2023. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3543507.3583251.
Vulić, Ivan, Douwe Kiela, Stephen Clark, and Marie-Francine Moens. "Multi-Modal Representations for Improved Bilingual Lexicon Learning." In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. http://dx.doi.org/10.18653/v1/p16-2031.
Wang, Kaiye, Wei Wang, and Liang Wang. "Learning unified sparse representations for multi-modal data." In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015. http://dx.doi.org/10.1109/icip.2015.7351464.
Liu, Xinyi, Wanxian Guan, Lianyun Li, Hui Li, Chen Lin, Xubin Li, Si Chen, Jian Xu, Hongbo Deng, and Bo Zheng. "Pretraining Representations of Multi-modal Multi-query E-commerce Search." In KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3534678.3539200.
Parfenova, Iuliia, Desmond Elliott, Raquel Fernández, and Sandro Pezzelle. "Probing Cross-Modal Representations in Multi-Step Relational Reasoning." In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021). Stroudsburg, PA, USA: Association for Computational Linguistics, 2021. http://dx.doi.org/10.18653/v1/2021.repl4nlp-1.16.
Huang, Jia-Hong, Ting-Wei Wu, and Marcel Worring. "Contextualized Keyword Representations for Multi-modal Retinal Image Captioning." In ICMR '21: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3460426.3463667.
Grossiord, Eloise, Laurent Risser, Salim Kanoun, Soleakhena Ken, and Francois Malgouyres. "Learning Optimal Shape Representations for Multi-Modal Image Registration." In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020. http://dx.doi.org/10.1109/isbi45749.2020.9098631.
Lara, Bruno, and Juan M. Rendon. "Prediction of Undesired Situations Based on Multi-Modal Representations." In 2006 Electronics, Robotics and Automotive Mechanics Conference. IEEE, 2006. http://dx.doi.org/10.1109/cerma.2006.75.