Academic literature on the topic 'Multimodal embedding space'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multimodal embedding space.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Multimodal embedding space"
Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev, and Alexander Panchenko. "On Isotropy of Multimodal Embeddings." Information 14, no. 7 (July 10, 2023): 392. http://dx.doi.org/10.3390/info14070392.
Full textMai, Sijie, Haifeng Hu, and Songlong Xing. "Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 164–72. http://dx.doi.org/10.1609/aaai.v34i01.5347.
Full textZhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (May 18, 2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.
Full textGuo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi, and Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (March 24, 2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.
Full textMoon, Jucheol, Nhat Anh Le, Nelson Hebert Minaya, and Sang-Il Choi. "Multimodal Few-Shot Learning for Gait Recognition." Applied Sciences 10, no. 21 (October 29, 2020): 7619. http://dx.doi.org/10.3390/app10217619.
Full textZhang, Rongchao, Yiwei Lou, Dexuan Xu, Yongzhi Cao, Hanpin Wang, and Yu Huang. "A Learnable Discrete-Prior Fusion Autoencoder with Contrastive Learning for Tabular Data Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 16803–11. http://dx.doi.org/10.1609/aaai.v38i15.29621.
Full textMerkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (July 2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.
Full textFan, Yunpeng, Wenyou Du, Yingwei Zhang, and Xiaogang Wang. "Fault Detection for Multimodal Process Using Quality-Relevant Kernel Neighborhood Preserving Embedding." Mathematical Problems in Engineering 2015 (2015): 1–15. http://dx.doi.org/10.1155/2015/210125.
Full textOta, Kosuke, Keiichiro Shirai, Hidetoshi Miyao, and Minoru Maruyama. "Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings." Journal of Advanced Computational Intelligence and Intelligent Informatics 26, no. 6 (November 20, 2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.
Full textKim, Jongseok, Youngjae Yu, Hoeseong Kim, and Gunhee Kim. "Dual Compositional Learning in Interactive Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1771–79. http://dx.doi.org/10.1609/aaai.v35i2.16271.
Full textDissertations / Theses on the topic "Multimodal embedding space"
Couairon, Guillaume. "Text-Based Semantic Image Editing." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.
Full textThe aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models
Book chapters on the topic "Multimodal embedding space"
Zhang, Chao, and Jiawei Han. "Data Mining and Knowledge Discovery." In Urban Informatics, 797–814. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-15-8983-6_42.
Full textZhao, Xiang, Weixin Zeng, and Jiuyang Tang. "Multimodal Entity Alignment." In Entity Alignment, 229–47. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4250-3_9.
Full textValles-Perez, Ivan, Grzegorz Beringer, Piotr Bilinski, Gary Cook, and Roberto Barra-Chicote. "SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2023. http://dx.doi.org/10.3233/faia230540.
Full textConference papers on the topic "Multimodal embedding space"
Bhattacharya, Indrani, Arkabandhu Chowdhury, and Vikas C. Raykar. "Multimodal Dialog for Browsing Large Visual Catalogs using Exploration-Exploitation Paradigm in a Joint Embedding Space." In ICMR '19: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3323873.3325036.
Full textRostami, Mohammad, and Aram Galstyan. "Cognitively Inspired Learning of Incremental Drifting Concepts." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/341.
Full textGopalakrishnan, Sabarish, Premkumar Udaiyar, Shagan Sah, and Raymond Ptucha. "Multi Stage Common Vector Space for Multimodal Embeddings." In 2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). IEEE, 2019. http://dx.doi.org/10.1109/aipr47015.2019.9174583.
Full textFeng, LiWei, Hao Ai, and Yuan Li. "Multimode Process Monitoring Based on Density Space Clustering Locally Linear Embedding Technique." In 2023 2nd Conference on Fully Actuated System Theory and Applications (CFASTA). IEEE, 2023. http://dx.doi.org/10.1109/cfasta57821.2023.10243375.
Full textPasi, Piyush Singh, Karthikeya Battepati, Preethi Jyothi, Ganesh Ramakrishnan, Tanmay Mahapatra, and Manoj Singh. "Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/683.
Full text