Academic literature on the topic 'Multimodal Embeddings'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multimodal Embeddings.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Multimodal Embeddings"
Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev, and Alexander Panchenko. "On Isotropy of Multimodal Embeddings." Information 14, no. 7 (July 10, 2023): 392. http://dx.doi.org/10.3390/info14070392.
Full textGuo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi, and Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (March 24, 2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.
Full textShang, Bin, Yinliang Zhao, Jun Liu, and Di Wang. "LAFA: Multimodal Knowledge Graph Completion with Link Aware Fusion and Aggregation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (March 24, 2024): 8957–65. http://dx.doi.org/10.1609/aaai.v38i8.28744.
Full textSun, Zhongkai, Prathusha Sarma, William Sethares, and Yingyu Liang. "Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8992–99. http://dx.doi.org/10.1609/aaai.v34i05.6431.
Full textMerkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (July 2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.
Full textTang, Zhenchao, Jiehui Huang, Guanxing Chen, and Calvin Yu-Chian Chen. "Comprehensive View Embedding Learning for Single-Cell Multimodal Integration." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (March 24, 2024): 15292–300. http://dx.doi.org/10.1609/aaai.v38i14.29453.
Full textZhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (May 18, 2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.
Full textSah, Shagan, Sabarish Gopalakishnan, and Raymond Ptucha. "Aligned attention for common multimodal embeddings." Journal of Electronic Imaging 29, no. 02 (March 25, 2020): 1. http://dx.doi.org/10.1117/1.jei.29.2.023013.
Full textZhang, Rongchao, Yiwei Lou, Dexuan Xu, Yongzhi Cao, Hanpin Wang, and Yu Huang. "A Learnable Discrete-Prior Fusion Autoencoder with Contrastive Learning for Tabular Data Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 16803–11. http://dx.doi.org/10.1609/aaai.v38i15.29621.
Full textLin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.
Full textDissertations / Theses on the topic "Multimodal Embeddings"
Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.
Full textNowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric
Deschamps-Berger, Théo. "Social Emotion Recognition with multimodal deep learning architecture in emergency call centers." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG036.
Full textThis thesis explores automatic speech-emotion recognition systems in a medical emergency context. It addresses some of the challenges encountered when studying emotions in social interactions. It is rooted in modern theories of emotions, particularly those of Lisa Feldman Barrett on the construction of emotions. Indeed, the manifestation of emotions in human interactions is complex and often characterized by nuanced, mixed, and is highly linked to the context. This study is based on the CEMO corpus, which is composed of telephone conversations between callers and emergency medical dispatchers (EMD) from a French emergency call center. This corpus provides a rich dataset to explore the capacity of deep learning systems, such as Transformers and pre-trained models, to recognize spontaneous emotions in spoken interactions. The applications could be to provide emotional cues that could improve call handling and decision-making by EMD, or to summarize calls. The work carried out in my thesis focused on different techniques related to speech emotion recognition, including transfer learning from pre-trained models, multimodal fusion strategies, dialogic context integration, and mixed emotion detection. An initial acoustic system based on temporal convolutions and recurrent networks was developed and validated on an emotional corpus widely used by the affective community, called IEMOCAP, and then on the CEMO corpus. Extensive research on multimodal systems, pre-trained in acoustics and linguistics and adapted to emotion recognition, is presented. In addition, the integration of dialog context in emotion recognition was explored, underlining the complex dynamics of emotions in social interactions. Finally, research has been initiated towards developing multi-label, multimodal systems capable of handling the subtleties of mixed emotions, often due to the annotator's perception and social context. Our research highlights some solutions and challenges in recognizing emotions in the wild. The CNRS AI HUMAAINE Chair: HUman-MAchine Affective Interaction & Ethics funded this thesis
Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Full textIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Rubio, Romano Antonio. "Fashion discovery : a computer vision approach." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.
Full textLa interpretación semántica de imágenes del mundo de la moda es sin duda uno de los dominios más desafiantes para la visión por computador. Leves variaciones en color y forma pueden conferir significados o interpretaciones distintas a una imagen. Es un dominio estrechamente ligado a la comprensión humana subjetiva, pero también a la interpretación y reconocimiento de escenarios y contextos. Ser capaz de extraer información específica sobre moda de imágenes e interpretarla de manera correcta puede ser útil en muchas situaciones y puede ayudar a entender la información subyacente en una imagen. Además, la moda es uno de los negocios más importantes a nivel global, con un valor estimado de tres trillones de dólares y un mercado online en constante crecimiento, lo cual aumenta el interés de los algoritmos basados en imágenes para buscar, clasificar o recomendar prendas. Esta tesis doctoral pretende resolver problemas específicos relacionados con el tratamiento de datos de tiendas virtuales de moda, yendo desde la información más básica a nivel de píxel hasta un entendimiento más abstracto que permita extraer conclusiones sobre las prendas presentes en una imagen, aprovechando para ello la Multi-modalidad de los datos disponibles para desarrollar algunas de las soluciones. Las contribuciones incluyen: - Un nuevo método de extracción de superpíxeles enfocado a mejorar el proceso de anotación de imágenes de moda. - La construcción de un espacio común para representar imágenes y textos referentes a moda. - La aplicación de ese espacio en la tarea de identificar el producto principal dentro de una imagen que muestra un conjunto de prendas. En resumen, la moda es un dominio complejo a muchos niveles en términos de visión por computador y aprendizaje automático, y desarrollar algoritmos específicos capaces de capturar la información esencial a partir de imágenes y textos no es una tarea trivial. Con el fin de resolver algunos de los desafíos que esta plantea, y considerando que este es un doctorado industrial, contribuimos al tema con una variedad de soluciones que pueden mejorar el rendimiento de muchas tareas extremadamente útiles para la industria de la moda online
Automàtica, robòtica i visió
Couairon, Guillaume. "Text-Based Semantic Image Editing." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.
Full textThe aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models
ur, Réhman Shafiq. "Expressing emotions through vibration for perception and control." Doctoral thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-32990.
Full textTaktil Video
Book chapters on the topic "Multimodal Embeddings"
Zhao, Xiang, Weixin Zeng, and Jiuyang Tang. "Multimodal Entity Alignment." In Entity Alignment, 229–47. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4250-3_9.
Full textDolphin, Rian, Barry Smyth, and Ruihai Dong. "A Machine Learning Approach to Industry Classification in Financial Markets." In Communications in Computer and Information Science, 81–94. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_7.
Full textGornishka, Iva, Stevan Rudinac, and Marcel Worring. "Interactive Search and Exploration in Discussion Forums Using Multimodal Embeddings." In MultiMedia Modeling, 388–99. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-37734-2_32.
Full textDadwal, Rajjat, Ran Yu, and Elena Demidova. "A Multimodal and Multitask Approach for Adaptive Geospatial Region Embeddings." In Advances in Knowledge Discovery and Data Mining, 363–75. Singapore: Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-2262-4_29.
Full textPandey, Sandeep Kumar, Hanumant Singh Shekhawat, Shalendar Bhasin, Ravi Jasuja, and S. R. M. Prasanna. "Alzheimer’s Dementia Recognition Using Multimodal Fusion of Speech and Text Embeddings." In Intelligent Human Computer Interaction, 718–28. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98404-5_64.
Full textZhou, Liting, and Cathal Gurrin. "Multimodal Embedding for Lifelog Retrieval." In MultiMedia Modeling, 416–27. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98358-1_33.
Full textTruchan, Hubert, Evgenii Naumov, Rezaul Abedin, Gregory Palmer, and Zahra Ahmadi. "Multimodal Isotropic Neural Architecture with Patch Embedding." In Neural Information Processing, 173–87. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8079-6_14.
Full textHazman, Muzhaffar, Susan McKeever, and Josephine Griffith. "Meme Sentiment Analysis Enhanced with Multimodal Spatial Encoding and Face Embedding." In Communications in Computer and Information Science, 318–31. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_25.
Full textFrilling, Andrea, and Ashley K. Clift. "Surgery in Combination with Peptide Receptor Radionuclide Therapy: A Novel Approach for the Treatment of Advanced Neuroendocrine Tumours." In Beyond Becquerel and Biology to Precision Radiomolecular Oncology: Festschrift in Honor of Richard P. Baum, 31–40. Cham: Springer International Publishing, 2024. http://dx.doi.org/10.1007/978-3-031-33533-4_3.
Full textAuer, Peter, Barbara Laner, Martin Pfeiffer, and Kerstin Botsch. "Noticing and assessing nature." In Studies in Language and Social Interaction, 245–75. Amsterdam: John Benjamins Publishing Company, 2024. http://dx.doi.org/10.1075/slsi.36.09aue.
Full textConference papers on the topic "Multimodal Embeddings"
Liu, Ruizhou, Zongsheng Cao, Zhe Wu, Qianqian Xu, and Qingming Huang. "Multimodal Knowledge Graph Embeddings via Lorentz-based Contrastive Learning." In 2024 IEEE International Conference on Multimedia and Expo (ICME), 1–6. IEEE, 2024. http://dx.doi.org/10.1109/icme57554.2024.10687608.
Full textTakemaru, Lina, Shu Yang, Ruiming Wu, Bing He, Christos Davtzikos, Jingwen Yan, and Li Shen. "Mapping Alzheimer’s Disease Pseudo-Progression With Multimodal Biomarker Trajectory Embeddings." In 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 1–5. IEEE, 2024. http://dx.doi.org/10.1109/isbi56570.2024.10635249.
Full textAlhabashi, Yasser, Abdullah Alharbi, Samar Ahmad, Serry Sibaee, Omer Nacar, Lahouari Ghouti, and Anis Koubaa. "ASOS at ArAIEval Shared Task: Integrating Text and Image Embeddings for Multimodal Propaganda Detection in Arabic Memes." In Proceedings of The Second Arabic Natural Language Processing Conference, 473–77. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.arabicnlp-1.46.
Full textChaabouni, Rahma, Ewan Dunbar, Neil Zeghidour, and Emmanuel Dupoux. "Learning Weakly Supervised Multimodal Phoneme Embeddings." In Interspeech 2017. ISCA: ISCA, 2017. http://dx.doi.org/10.21437/interspeech.2017-1689.
Full textMustafina, Sofia, Andrey Akimov, and Svetlana Mustafina. "Multimodal Embeddings In Emotion Recognition Research." In 2023 5th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA). IEEE, 2023. http://dx.doi.org/10.1109/summa60232.2023.10349422.
Full textCalabrese, Agostina, Michele Bevilacqua, and Roberto Navigli. "EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/67.
Full textLu, Yuxing, Weichen Zhao, Nan Sun, and Jinzhuo Wang. "Enhancing Multimodal Knowledge Graph Representation Learning through Triple Contrastive Learning." In Thirty-Third International Joint Conference on Artificial Intelligence {IJCAI-24}. California: International Joint Conferences on Artificial Intelligence Organization, 2024. http://dx.doi.org/10.24963/ijcai.2024/659.
Full textZhang, Miaoran, Marius Mosbach, David Adelani, Michael Hedderich, and Dietrich Klakow. "MCSE: Multimodal Contrastive Learning of Sentence Embeddings." In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.naacl-main.436.
Full textNeculai, Andrei, Yanbei Chen, and Zeynep Akata. "Probabilistic Compositional Embeddings for Multimodal Image Retrieval." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2022. http://dx.doi.org/10.1109/cvprw56347.2022.00501.
Full textMahajan, Shweta, Teresa Botschen, Iryna Gurevych, and Stefan Roth. "Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings." In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019. http://dx.doi.org/10.1109/iccvw.2019.00557.
Full text