Letteratura scientifica selezionata sul tema "Multimodal Embeddings"
Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili
Consulta la lista di attuali articoli, libri, tesi, atti di convegni e altre fonti scientifiche attinenti al tema "Multimodal Embeddings".
Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.
Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.
Articoli di riviste sul tema "Multimodal Embeddings"
Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev e Alexander Panchenko. "On Isotropy of Multimodal Embeddings". Information 14, n. 7 (10 luglio 2023): 392. http://dx.doi.org/10.3390/info14070392.
Testo completoGuo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi e Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 8 (24 marzo 2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.
Testo completoShang, Bin, Yinliang Zhao, Jun Liu e Di Wang. "LAFA: Multimodal Knowledge Graph Completion with Link Aware Fusion and Aggregation". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 8 (24 marzo 2024): 8957–65. http://dx.doi.org/10.1609/aaai.v38i8.28744.
Testo completoSun, Zhongkai, Prathusha Sarma, William Sethares e Yingyu Liang. "Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis". Proceedings of the AAAI Conference on Artificial Intelligence 34, n. 05 (3 aprile 2020): 8992–99. http://dx.doi.org/10.1609/aaai.v34i05.6431.
Testo completoMerkx, Danny, e Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge". Natural Language Engineering 25, n. 4 (luglio 2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.
Testo completoTang, Zhenchao, Jiehui Huang, Guanxing Chen e Calvin Yu-Chian Chen. "Comprehensive View Embedding Learning for Single-Cell Multimodal Integration". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 14 (24 marzo 2024): 15292–300. http://dx.doi.org/10.1609/aaai.v38i14.29453.
Testo completoZhang, Linhai, Deyu Zhou, Yulan He e Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 16 (18 maggio 2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.
Testo completoSah, Shagan, Sabarish Gopalakishnan e Raymond Ptucha. "Aligned attention for common multimodal embeddings". Journal of Electronic Imaging 29, n. 02 (25 marzo 2020): 1. http://dx.doi.org/10.1117/1.jei.29.2.023013.
Testo completoZhang, Rongchao, Yiwei Lou, Dexuan Xu, Yongzhi Cao, Hanpin Wang e Yu Huang. "A Learnable Discrete-Prior Fusion Autoencoder with Contrastive Learning for Tabular Data Synthesis". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 15 (24 marzo 2024): 16803–11. http://dx.doi.org/10.1609/aaai.v38i15.29621.
Testo completoLin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang e Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval". Proceedings of the AAAI Conference on Artificial Intelligence 34, n. 07 (3 aprile 2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.
Testo completoTesi sul tema "Multimodal Embeddings"
Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.
Testo completoNowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric
Deschamps-Berger, Théo. "Social Emotion Recognition with multimodal deep learning architecture in emergency call centers". Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG036.
Testo completoThis thesis explores automatic speech-emotion recognition systems in a medical emergency context. It addresses some of the challenges encountered when studying emotions in social interactions. It is rooted in modern theories of emotions, particularly those of Lisa Feldman Barrett on the construction of emotions. Indeed, the manifestation of emotions in human interactions is complex and often characterized by nuanced, mixed, and is highly linked to the context. This study is based on the CEMO corpus, which is composed of telephone conversations between callers and emergency medical dispatchers (EMD) from a French emergency call center. This corpus provides a rich dataset to explore the capacity of deep learning systems, such as Transformers and pre-trained models, to recognize spontaneous emotions in spoken interactions. The applications could be to provide emotional cues that could improve call handling and decision-making by EMD, or to summarize calls. The work carried out in my thesis focused on different techniques related to speech emotion recognition, including transfer learning from pre-trained models, multimodal fusion strategies, dialogic context integration, and mixed emotion detection. An initial acoustic system based on temporal convolutions and recurrent networks was developed and validated on an emotional corpus widely used by the affective community, called IEMOCAP, and then on the CEMO corpus. Extensive research on multimodal systems, pre-trained in acoustics and linguistics and adapted to emotion recognition, is presented. In addition, the integration of dialog context in emotion recognition was explored, underlining the complex dynamics of emotions in social interactions. Finally, research has been initiated towards developing multi-label, multimodal systems capable of handling the subtleties of mixed emotions, often due to the annotator's perception and social context. Our research highlights some solutions and challenges in recognizing emotions in the wild. The CNRS AI HUMAAINE Chair: HUman-MAchine Affective Interaction & Ethics funded this thesis
Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data". Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Testo completoIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Rubio, Romano Antonio. "Fashion discovery : a computer vision approach". Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.
Testo completoLa interpretación semántica de imágenes del mundo de la moda es sin duda uno de los dominios más desafiantes para la visión por computador. Leves variaciones en color y forma pueden conferir significados o interpretaciones distintas a una imagen. Es un dominio estrechamente ligado a la comprensión humana subjetiva, pero también a la interpretación y reconocimiento de escenarios y contextos. Ser capaz de extraer información específica sobre moda de imágenes e interpretarla de manera correcta puede ser útil en muchas situaciones y puede ayudar a entender la información subyacente en una imagen. Además, la moda es uno de los negocios más importantes a nivel global, con un valor estimado de tres trillones de dólares y un mercado online en constante crecimiento, lo cual aumenta el interés de los algoritmos basados en imágenes para buscar, clasificar o recomendar prendas. Esta tesis doctoral pretende resolver problemas específicos relacionados con el tratamiento de datos de tiendas virtuales de moda, yendo desde la información más básica a nivel de píxel hasta un entendimiento más abstracto que permita extraer conclusiones sobre las prendas presentes en una imagen, aprovechando para ello la Multi-modalidad de los datos disponibles para desarrollar algunas de las soluciones. Las contribuciones incluyen: - Un nuevo método de extracción de superpíxeles enfocado a mejorar el proceso de anotación de imágenes de moda. - La construcción de un espacio común para representar imágenes y textos referentes a moda. - La aplicación de ese espacio en la tarea de identificar el producto principal dentro de una imagen que muestra un conjunto de prendas. En resumen, la moda es un dominio complejo a muchos niveles en términos de visión por computador y aprendizaje automático, y desarrollar algoritmos específicos capaces de capturar la información esencial a partir de imágenes y textos no es una tarea trivial. Con el fin de resolver algunos de los desafíos que esta plantea, y considerando que este es un doctorado industrial, contribuimos al tema con una variedad de soluciones que pueden mejorar el rendimiento de muchas tareas extremadamente útiles para la industria de la moda online
Automàtica, robòtica i visió
Couairon, Guillaume. "Text-Based Semantic Image Editing". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.
Testo completoThe aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models
ur, Réhman Shafiq. "Expressing emotions through vibration for perception and control". Doctoral thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-32990.
Testo completoTaktil Video
Capitoli di libri sul tema "Multimodal Embeddings"
Zhao, Xiang, Weixin Zeng e Jiuyang Tang. "Multimodal Entity Alignment". In Entity Alignment, 229–47. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4250-3_9.
Testo completoDolphin, Rian, Barry Smyth e Ruihai Dong. "A Machine Learning Approach to Industry Classification in Financial Markets". In Communications in Computer and Information Science, 81–94. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_7.
Testo completoGornishka, Iva, Stevan Rudinac e Marcel Worring. "Interactive Search and Exploration in Discussion Forums Using Multimodal Embeddings". In MultiMedia Modeling, 388–99. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-37734-2_32.
Testo completoDadwal, Rajjat, Ran Yu e Elena Demidova. "A Multimodal and Multitask Approach for Adaptive Geospatial Region Embeddings". In Advances in Knowledge Discovery and Data Mining, 363–75. Singapore: Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-2262-4_29.
Testo completoPandey, Sandeep Kumar, Hanumant Singh Shekhawat, Shalendar Bhasin, Ravi Jasuja e S. R. M. Prasanna. "Alzheimer’s Dementia Recognition Using Multimodal Fusion of Speech and Text Embeddings". In Intelligent Human Computer Interaction, 718–28. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98404-5_64.
Testo completoZhou, Liting, e Cathal Gurrin. "Multimodal Embedding for Lifelog Retrieval". In MultiMedia Modeling, 416–27. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98358-1_33.
Testo completoTruchan, Hubert, Evgenii Naumov, Rezaul Abedin, Gregory Palmer e Zahra Ahmadi. "Multimodal Isotropic Neural Architecture with Patch Embedding". In Neural Information Processing, 173–87. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8079-6_14.
Testo completoHazman, Muzhaffar, Susan McKeever e Josephine Griffith. "Meme Sentiment Analysis Enhanced with Multimodal Spatial Encoding and Face Embedding". In Communications in Computer and Information Science, 318–31. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_25.
Testo completoFrilling, Andrea, e Ashley K. Clift. "Surgery in Combination with Peptide Receptor Radionuclide Therapy: A Novel Approach for the Treatment of Advanced Neuroendocrine Tumours". In Beyond Becquerel and Biology to Precision Radiomolecular Oncology: Festschrift in Honor of Richard P. Baum, 31–40. Cham: Springer International Publishing, 2024. http://dx.doi.org/10.1007/978-3-031-33533-4_3.
Testo completoAuer, Peter, Barbara Laner, Martin Pfeiffer e Kerstin Botsch. "Noticing and assessing nature". In Studies in Language and Social Interaction, 245–75. Amsterdam: John Benjamins Publishing Company, 2024. http://dx.doi.org/10.1075/slsi.36.09aue.
Testo completoAtti di convegni sul tema "Multimodal Embeddings"
Liu, Ruizhou, Zongsheng Cao, Zhe Wu, Qianqian Xu e Qingming Huang. "Multimodal Knowledge Graph Embeddings via Lorentz-based Contrastive Learning". In 2024 IEEE International Conference on Multimedia and Expo (ICME), 1–6. IEEE, 2024. http://dx.doi.org/10.1109/icme57554.2024.10687608.
Testo completoTakemaru, Lina, Shu Yang, Ruiming Wu, Bing He, Christos Davtzikos, Jingwen Yan e Li Shen. "Mapping Alzheimer’s Disease Pseudo-Progression With Multimodal Biomarker Trajectory Embeddings". In 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 1–5. IEEE, 2024. http://dx.doi.org/10.1109/isbi56570.2024.10635249.
Testo completoAlhabashi, Yasser, Abdullah Alharbi, Samar Ahmad, Serry Sibaee, Omer Nacar, Lahouari Ghouti e Anis Koubaa. "ASOS at ArAIEval Shared Task: Integrating Text and Image Embeddings for Multimodal Propaganda Detection in Arabic Memes". In Proceedings of The Second Arabic Natural Language Processing Conference, 473–77. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.arabicnlp-1.46.
Testo completoChaabouni, Rahma, Ewan Dunbar, Neil Zeghidour e Emmanuel Dupoux. "Learning Weakly Supervised Multimodal Phoneme Embeddings". In Interspeech 2017. ISCA: ISCA, 2017. http://dx.doi.org/10.21437/interspeech.2017-1689.
Testo completoMustafina, Sofia, Andrey Akimov e Svetlana Mustafina. "Multimodal Embeddings In Emotion Recognition Research". In 2023 5th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA). IEEE, 2023. http://dx.doi.org/10.1109/summa60232.2023.10349422.
Testo completoCalabrese, Agostina, Michele Bevilacqua e Roberto Navigli. "EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings". In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/67.
Testo completoLu, Yuxing, Weichen Zhao, Nan Sun e Jinzhuo Wang. "Enhancing Multimodal Knowledge Graph Representation Learning through Triple Contrastive Learning". In Thirty-Third International Joint Conference on Artificial Intelligence {IJCAI-24}. California: International Joint Conferences on Artificial Intelligence Organization, 2024. http://dx.doi.org/10.24963/ijcai.2024/659.
Testo completoZhang, Miaoran, Marius Mosbach, David Adelani, Michael Hedderich e Dietrich Klakow. "MCSE: Multimodal Contrastive Learning of Sentence Embeddings". In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.naacl-main.436.
Testo completoNeculai, Andrei, Yanbei Chen e Zeynep Akata. "Probabilistic Compositional Embeddings for Multimodal Image Retrieval". In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2022. http://dx.doi.org/10.1109/cvprw56347.2022.00501.
Testo completoMahajan, Shweta, Teresa Botschen, Iryna Gurevych e Stefan Roth. "Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings". In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019. http://dx.doi.org/10.1109/iccvw.2019.00557.
Testo completo