Literatura científica selecionada sobre o tema "Multimodal Embeddings"

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Consulte a lista de atuais artigos, livros, teses, anais de congressos e outras fontes científicas relevantes para o tema "Multimodal Embeddings".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Artigos de revistas sobre o assunto "Multimodal Embeddings"

1

Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev, and Alexander Panchenko. "On Isotropy of Multimodal Embeddings." Information 14, no. 7 (2023): 392. http://dx.doi.org/10.3390/info14070392.

Texto completo da fonte
Resumo:
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Guo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi, and Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.

Texto completo da fonte
Resumo:
The multimodal recommendation has gradually become the infrastructure of online media platforms, enabling them to provide personalized service to users through a joint modeling of user historical behaviors (e.g., purchases, clicks) and item various modalities (e.g., visual and textual). The majority of existing studies typically focus on utilizing modal features or modal-related graph structure to learn user local interests. Nevertheless, these approaches encounter two limitations: (1) Shared updates of user ID embeddings result in the consequential coupling between collaboration and multimoda
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Shang, Bin, Yinliang Zhao, Jun Liu, and Di Wang. "LAFA: Multimodal Knowledge Graph Completion with Link Aware Fusion and Aggregation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8957–65. http://dx.doi.org/10.1609/aaai.v38i8.28744.

Texto completo da fonte
Resumo:
Recently, an enormous amount of research has emerged on multimodal knowledge graph completion (MKGC), which seeks to extract knowledge from multimodal data and predict the most plausible missing facts to complete a given multimodal knowledge graph (MKG). However, existing MKGC approaches largely ignore that visual information may introduce noise and lead to uncertainty when adding them to the traditional KG embeddings due to the contribution of each associated image to entity is different in diverse link scenarios. Moreover, treating each triple independently when learning entity embeddings le
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Sun, Zhongkai, Prathusha Sarma, William Sethares, and Yingyu Liang. "Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8992–99. http://dx.doi.org/10.1609/aaai.v34i05.6431.

Texto completo da fonte
Resumo:
Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment anal
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Texto completo da fonte
Resumo:
AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves re
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Mihail Mateev. "Comparative Analysis on Implementing Embeddings for Image Analysis." Journal of Information Systems Engineering and Management 10, no. 17s (2025): 89–102. https://doi.org/10.52783/jisem.v10i17s.2710.

Texto completo da fonte
Resumo:
This research explores how artificial intelligence enhances construction maintenance and diagnostics, achieving 95% accuracy on a dataset of 10,000 cases. The findings highlight AI's potential to revolutionize predictive maintenance in the industry. The growing adoption of image embeddings has transformed visual data processing across AI applications. This study evaluates embedding implementations in major platforms, including Azure AI, OpenAI's GPT-4 Vision, and frameworks like Hugging Face, Replicate, and Eden AI. It assesses their scalability, accuracy, cost-effectiveness, and integration f
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Tang, Zhenchao, Jiehui Huang, Guanxing Chen, and Calvin Yu-Chian Chen. "Comprehensive View Embedding Learning for Single-Cell Multimodal Integration." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15292–300. http://dx.doi.org/10.1609/aaai.v38i14.29453.

Texto completo da fonte
Resumo:
Motivation: Advances in single-cell measurement techniques provide rich multimodal data, which helps us to explore the life state of cells more deeply. However, multimodal integration, or, learning joint embeddings from multimodal data remains a current challenge. The difficulty in integrating unpaired single-cell multimodal data is that different modalities have different feature spaces, which easily leads to information loss in joint embedding. And few existing methods have fully exploited and fused the information in single-cell multimodal data. Result: In this study, we propose CoVEL, a de
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Zhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.

Texto completo da fonte
Resumo:
Previous work has shown the effectiveness of using event representations for tasks such as script event prediction and stock market prediction. It is however still challenging to learn the subtle semantic differences between events based solely on textual descriptions of events often represented as (subject, predicate, object) triples. As an alternative, images offer a more intuitive way of understanding event semantics. We observe that event described in text and in images show different abstraction levels and therefore should be projected onto heterogeneous embedding spaces, as opposed to wh
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Sah, Shagan, Sabarish Gopalakishnan, and Raymond Ptucha. "Aligned attention for common multimodal embeddings." Journal of Electronic Imaging 29, no. 02 (2020): 1. http://dx.doi.org/10.1117/1.jei.29.2.023013.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Alkaabi, Hussein, Ali Kadhim Jasim, and Ali Darroudi. "From Static to Contextual: A Survey of Embedding Advances in NLP." PERFECT: Journal of Smart Algorithms 2, no. 2 (2025): 57–66. https://doi.org/10.62671/perfect.v2i2.77.

Texto completo da fonte
Resumo:
Embedding techniques have been a cornerstone of Natural Language Processing (NLP), enabling machines to represent textual data in a form that captures semantic and syntactic relationships. Over the years, the field has witnessed a significant evolution—from static word embeddings, such as Word2Vec and GloVe, which represent words as fixed vectors, to dynamic, contextualized embeddings like BERT and GPT, which generate word representations based on their surrounding context. This survey provides a comprehensive overview of embedding techniques, tracing their development from early methods to st
Estilos ABNT, Harvard, Vancouver, APA, etc.
Mais fontes

Teses / dissertações sobre o assunto "Multimodal Embeddings"

1

Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.

Texto completo da fonte
Resumo:
De nos jours l’Intelligence artificielle (IA) est omniprésente dans notre société. Le récent développement des méthodes d’apprentissage basé sur les réseaux de neurones profonds aussi appelé “Deep Learning” a permis une nette amélioration des modèles de représentation visuelle et textuelle. Cette thèse aborde la question de l’apprentissage de plongements multimodaux pour représenter conjointement des données visuelles et sémantiques. C’est une problématique centrale dans le contexte actuel de l’IA et du deep learning, qui présente notamment un très fort potentiel pour l’interprétabilité des mo
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Deschamps-Berger, Théo. "Social Emotion Recognition with multimodal deep learning architecture in emergency call centers." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG036.

Texto completo da fonte
Resumo:
Cette thèse porte sur les systèmes de reconnaissance automatique des émotions dans la parole, dans un contexte d'urgence médicale. Elle aborde certains des défis rencontrés lors de l'étude des émotions dans les interactions sociales et est ancrée dans les théories modernes des émotions, en particulier celles de Lisa Feldman Barrett sur la construction des émotions. En effet, la manifestation des émotions spontanées dans les interactions humaines est complexe et souvent caractérisée par des nuances, des mélanges et étroitement liée au contexte. Cette étude est fondée sur le corpus CEMO, composé
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.

Texto completo da fonte
Resumo:
La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Rubio, Romano Antonio. "Fashion discovery : a computer vision approach." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.

Texto completo da fonte
Resumo:
Performing semantic interpretation of fashion images is undeniably one of the most challenging domains for computer vision. Subtle variations in color and shape might confer different meanings or interpretations to an image. Not only is it a domain tightly coupled with human understanding, but also with scene interpretation and context. Being able to extract fashion-specific information from images and interpret that information in a proper manner can be useful in many situations and help understanding the underlying information in an image. Fashion is also one of the most important bus
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Couairon, Guillaume. "Text-Based Semantic Image Editing." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.

Texto completo da fonte
Resumo:
L’objectif de cette thèse est de proposer des algorithmes pour la tâche d’édition d’images basée sur le texte (TIE), qui consiste à éditer des images numériques selon une instruction formulée en langage naturel. Par exemple, étant donné une image d’un chien et la requête "Changez le chien en un chat", nous voulons produire une nouvelle image où le chien a été remplacé par un chat, en gardant tous les autres aspects de l’image inchangés (couleur et pose de l’animal, arrière- plan). L’objectif de l’étoile du nord est de permettre à tout un chacun de modifier ses images en util
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

ur, Réhman Shafiq. "Expressing emotions through vibration for perception and control." Doctoral thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-32990.

Texto completo da fonte
Resumo:
This thesis addresses a challenging problem: “how to let the visually impaired ‘see’ others emotions”. We, human beings, are heavily dependent on facial expressions to express ourselves. A smile shows that the person you are talking to is pleased, amused, relieved etc. People use emotional information from facial expressions to switch between conversation topics and to determine attitudes of individuals. Missing emotional information from facial expressions and head gestures makes the visually impaired extremely difficult to interact with others in social events. To enhance the visually impair
Estilos ABNT, Harvard, Vancouver, APA, etc.

Capítulos de livros sobre o assunto "Multimodal Embeddings"

1

Zhao, Xiang, Weixin Zeng, and Jiuyang Tang. "Multimodal Entity Alignment." In Entity Alignment. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4250-3_9.

Texto completo da fonte
Resumo:
AbstractIn various tasks related to artificial intelligence, data is often present in multiple forms or modalities. Recently, it has become a popular approach to combine these different forms of information into a knowledge graph, creating a multi-modal knowledge graph (MMKG). However, multi-modal knowledge graphs (MMKGs) often face issues of insufficient data coverage and incompleteness. In order to address this issue, a possible strategy is to incorporate supplemental information from other multi-modal knowledge graphs (MMKGs). To achieve this goal, current methods for aligning entities coul
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Gao, Yuan, Sangwook Kim, David E. Austin, and Chris McIntosh. "MEDBind: Unifying Language and Multimodal Medical Data Embeddings." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-72390-2_21.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Dolphin, Rian, Barry Smyth, and Ruihai Dong. "A Machine Learning Approach to Industry Classification in Financial Markets." In Communications in Computer and Information Science. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_7.

Texto completo da fonte
Resumo:
AbstractIndustry classification schemes provide a taxonomy for segmenting companies based on their business activities. They are relied upon in industry and academia as an integral component of many types of financial and economic analysis. However, even modern classification schemes have failed to embrace the era of big data and remain a largely subjective undertaking prone to inconsistency and misclassification. To address this, we propose a multimodal neural model for training company embeddings, which harnesses the dynamics of both historical pricing data and financial news to learn object
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Gornishka, Iva, Stevan Rudinac, and Marcel Worring. "Interactive Search and Exploration in Discussion Forums Using Multimodal Embeddings." In MultiMedia Modeling. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-37734-2_32.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Dadwal, Rajjat, Ran Yu, and Elena Demidova. "A Multimodal and Multitask Approach for Adaptive Geospatial Region Embeddings." In Advances in Knowledge Discovery and Data Mining. Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-2262-4_29.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Pandey, Sandeep Kumar, Hanumant Singh Shekhawat, Shalendar Bhasin, Ravi Jasuja, and S. R. M. Prasanna. "Alzheimer’s Dementia Recognition Using Multimodal Fusion of Speech and Text Embeddings." In Intelligent Human Computer Interaction. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98404-5_64.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Choe, Subeen, Jihyeon Oh, and Jihoon Yang. "Multimodal Contrastive Learning for Dialogue Embeddings with Global and Local Views." In Lecture Notes in Computer Science. Springer Nature Singapore, 2025. https://doi.org/10.1007/978-981-96-8180-8_13.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Gerber, Jonathan, Bruno Kreiner, Jasmin Saxer, and Andreas Weiler. "Towards Website X-Ray for Europe’s Municipalities: Unveiling Digital Transformation with Multimodal Embeddings." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. https://doi.org/10.1007/978-3-031-78090-5_11.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Praveen Kumar, T., and Lavanya Pamulaparty. "Enhancing Sentiment Analysis with Deep Learning Models and BERT Word Embeddings for Multimodal Reviews." In Cognitive Science and Technology. Springer Nature Singapore, 2025. https://doi.org/10.1007/978-981-97-9266-5_6.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Zhou, Liting, and Cathal Gurrin. "Multimodal Embedding for Lifelog Retrieval." In MultiMedia Modeling. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98358-1_33.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.

Trabalhos de conferências sobre o assunto "Multimodal Embeddings"

1

Liu, Ruizhou, Zongsheng Cao, Zhe Wu, Qianqian Xu, and Qingming Huang. "Multimodal Knowledge Graph Embeddings via Lorentz-based Contrastive Learning." In 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2024. http://dx.doi.org/10.1109/icme57554.2024.10687608.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Heo, Serin, Jehyun Kyung, and Joon-Hyuk Chang. "Multimodal Emotion Recognition with Target Speaker-Based Facial Embeddings." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10888205.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Dai, Wenliang, Zihan Liu, Tiezheng Yu, and Pascale Fung. "Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition." In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.aacl-main.30.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Takemaru, Lina, Shu Yang, Ruiming Wu, et al. "Mapping Alzheimer’s Disease Pseudo-Progression With Multimodal Biomarker Trajectory Embeddings." In 2024 IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, 2024. http://dx.doi.org/10.1109/isbi56570.2024.10635249.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Oliveira, Artur, Mateus Espadoto, Roberto Hirata Jr., and Roberto Cesar Jr. "Improving Image Classification Tasks Using Fused Embeddings and Multimodal Models." In 20th International Conference on Computer Vision Theory and Applications. SCITEPRESS - Science and Technology Publications, 2025. https://doi.org/10.5220/0013365600003912.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Zhong, Jiayang, Fuyao Chen, Lihui Chen, Dennis Shung, and John A. Onofrey. "Conditional Convolution of Clinical Data Embeddings for Multimodal Prostate Cancer Classification." In 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). IEEE, 2025. https://doi.org/10.1109/isbi60581.2025.10981307.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Arshad, Aresha, Momina Moetesum, Adnan Ul Hasan, and Faisal Shafait. "Enhancing Multimodal Information Extraction from Visually Rich Documents with 2D Positional Embeddings." In 2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2024. https://doi.org/10.1109/dicta63115.2024.00087.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Garaiman, Florian Enrico, and Anamaria Radoi. "Multimodal Emotion Recognition System based on X-Vector Embeddings and Convolutional Neural Networks." In 2024 15th International Conference on Communications (COMM). IEEE, 2024. http://dx.doi.org/10.1109/comm62355.2024.10741406.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Adiputra, Andro Aprila, Ahmada Yusril Kadiptya, Thi-Thu-Huong Le, JunYoung Son, and Howon Kim. "Enhancing Contextual Understanding with Multimodal Siamese Networks Using Contrastive Loss and Text Embeddings." In 2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 2025. https://doi.org/10.1109/icaiic64266.2025.10920874.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Lewis, Nora, Charles C. Cavalcante, Zois Boukouvalas, and Roberto Corizzo. "On the Effectiveness of Text and Image Embeddings in Multimodal Hate Speech Detection." In 2024 IEEE International Conference on Big Data (BigData). IEEE, 2024. https://doi.org/10.1109/bigdata62323.2024.10826088.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!