Auswahl der wissenschaftlichen Literatur zum Thema „Multimodal embedding space“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Inhaltsverzeichnis
Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Multimodal embedding space" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Zeitschriftenartikel zum Thema "Multimodal embedding space"
Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev und Alexander Panchenko. „On Isotropy of Multimodal Embeddings“. Information 14, Nr. 7 (10.07.2023): 392. http://dx.doi.org/10.3390/info14070392.
Der volle Inhalt der QuelleMai, Sijie, Haifeng Hu und Songlong Xing. „Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion“. Proceedings of the AAAI Conference on Artificial Intelligence 34, Nr. 01 (03.04.2020): 164–72. http://dx.doi.org/10.1609/aaai.v34i01.5347.
Der volle Inhalt der QuelleZhang, Linhai, Deyu Zhou, Yulan He und Zeng Yang. „MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 16 (18.05.2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.
Der volle Inhalt der QuelleGuo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi und Bin Ruan. „LGMRec: Local and Global Graph Learning for Multimodal Recommendation“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 8 (24.03.2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.
Der volle Inhalt der QuelleMoon, Jucheol, Nhat Anh Le, Nelson Hebert Minaya und Sang-Il Choi. „Multimodal Few-Shot Learning for Gait Recognition“. Applied Sciences 10, Nr. 21 (29.10.2020): 7619. http://dx.doi.org/10.3390/app10217619.
Der volle Inhalt der QuelleZhang, Rongchao, Yiwei Lou, Dexuan Xu, Yongzhi Cao, Hanpin Wang und Yu Huang. „A Learnable Discrete-Prior Fusion Autoencoder with Contrastive Learning for Tabular Data Synthesis“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 15 (24.03.2024): 16803–11. http://dx.doi.org/10.1609/aaai.v38i15.29621.
Der volle Inhalt der QuelleMerkx, Danny, und Stefan L. Frank. „Learning semantic sentence representations from visually grounded language without lexical knowledge“. Natural Language Engineering 25, Nr. 4 (Juli 2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.
Der volle Inhalt der QuelleFan, Yunpeng, Wenyou Du, Yingwei Zhang und Xiaogang Wang. „Fault Detection for Multimodal Process Using Quality-Relevant Kernel Neighborhood Preserving Embedding“. Mathematical Problems in Engineering 2015 (2015): 1–15. http://dx.doi.org/10.1155/2015/210125.
Der volle Inhalt der QuelleOta, Kosuke, Keiichiro Shirai, Hidetoshi Miyao und Minoru Maruyama. „Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings“. Journal of Advanced Computational Intelligence and Intelligent Informatics 26, Nr. 6 (20.11.2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.
Der volle Inhalt der QuelleKim, Jongseok, Youngjae Yu, Hoeseong Kim und Gunhee Kim. „Dual Compositional Learning in Interactive Image Retrieval“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 2 (18.05.2021): 1771–79. http://dx.doi.org/10.1609/aaai.v35i2.16271.
Der volle Inhalt der QuelleDissertationen zum Thema "Multimodal embedding space"
Couairon, Guillaume. „Text-Based Semantic Image Editing“. Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.
Der volle Inhalt der QuelleThe aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models
Buchteile zum Thema "Multimodal embedding space"
Zhang, Chao, und Jiawei Han. „Data Mining and Knowledge Discovery“. In Urban Informatics, 797–814. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-15-8983-6_42.
Der volle Inhalt der QuelleZhao, Xiang, Weixin Zeng und Jiuyang Tang. „Multimodal Entity Alignment“. In Entity Alignment, 229–47. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4250-3_9.
Der volle Inhalt der QuelleValles-Perez, Ivan, Grzegorz Beringer, Piotr Bilinski, Gary Cook und Roberto Barra-Chicote. „SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces“. In Frontiers in Artificial Intelligence and Applications. IOS Press, 2023. http://dx.doi.org/10.3233/faia230540.
Der volle Inhalt der QuelleKonferenzberichte zum Thema "Multimodal embedding space"
Bhattacharya, Indrani, Arkabandhu Chowdhury und Vikas C. Raykar. „Multimodal Dialog for Browsing Large Visual Catalogs using Exploration-Exploitation Paradigm in a Joint Embedding Space“. In ICMR '19: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3323873.3325036.
Der volle Inhalt der QuelleRostami, Mohammad, und Aram Galstyan. „Cognitively Inspired Learning of Incremental Drifting Concepts“. In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/341.
Der volle Inhalt der QuelleGopalakrishnan, Sabarish, Premkumar Udaiyar, Shagan Sah und Raymond Ptucha. „Multi Stage Common Vector Space for Multimodal Embeddings“. In 2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). IEEE, 2019. http://dx.doi.org/10.1109/aipr47015.2019.9174583.
Der volle Inhalt der QuelleFeng, LiWei, Hao Ai und Yuan Li. „Multimode Process Monitoring Based on Density Space Clustering Locally Linear Embedding Technique“. In 2023 2nd Conference on Fully Actuated System Theory and Applications (CFASTA). IEEE, 2023. http://dx.doi.org/10.1109/cfasta57821.2023.10243375.
Der volle Inhalt der QuellePasi, Piyush Singh, Karthikeya Battepati, Preethi Jyothi, Ganesh Ramakrishnan, Tanmay Mahapatra und Manoj Singh. „Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration“. In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/683.
Der volle Inhalt der Quelle