Academic literature on the topic 'Multimodal embedding space'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multimodal embedding space.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multimodal embedding space"

1

Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev, and Alexander Panchenko. "On Isotropy of Multimodal Embeddings." Information 14, no. 7 (2023): 392. http://dx.doi.org/10.3390/info14070392.

Full text
Abstract:
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for
APA, Harvard, Vancouver, ISO, and other styles
2

Mai, Sijie, Haifeng Hu, and Songlong Xing. "Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (2020): 164–72. http://dx.doi.org/10.1609/aaai.v34i01.5347.

Full text
Abstract:
Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.

Full text
Abstract:
Previous work has shown the effectiveness of using event representations for tasks such as script event prediction and stock market prediction. It is however still challenging to learn the subtle semantic differences between events based solely on textual descriptions of events often represented as (subject, predicate, object) triples. As an alternative, images offer a more intuitive way of understanding event semantics. We observe that event described in text and in images show different abstraction levels and therefore should be projected onto heterogeneous embedding spaces, as opposed to wh
APA, Harvard, Vancouver, ISO, and other styles
4

Guo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi, and Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.

Full text
Abstract:
The multimodal recommendation has gradually become the infrastructure of online media platforms, enabling them to provide personalized service to users through a joint modeling of user historical behaviors (e.g., purchases, clicks) and item various modalities (e.g., visual and textual). The majority of existing studies typically focus on utilizing modal features or modal-related graph structure to learn user local interests. Nevertheless, these approaches encounter two limitations: (1) Shared updates of user ID embeddings result in the consequential coupling between collaboration and multimoda
APA, Harvard, Vancouver, ISO, and other styles
5

Moon, Jucheol, Nhat Anh Le, Nelson Hebert Minaya, and Sang-Il Choi. "Multimodal Few-Shot Learning for Gait Recognition." Applied Sciences 10, no. 21 (2020): 7619. http://dx.doi.org/10.3390/app10217619.

Full text
Abstract:
A person’s gait is a behavioral trait that is uniquely associated with each individual and can be used to recognize the person. As information about the human gait can be captured by wearable devices, a few studies have led to the proposal of methods to process gait information for identification purposes. Despite recent advances in gait recognition, an open set gait recognition problem presents challenges to current approaches. To address the open set gait recognition problem, a system should be able to deal with unseen subjects who have not included in the training dataset. In this paper, we
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Rongchao, Yiwei Lou, Dexuan Xu, Yongzhi Cao, Hanpin Wang, and Yu Huang. "A Learnable Discrete-Prior Fusion Autoencoder with Contrastive Learning for Tabular Data Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (2024): 16803–11. http://dx.doi.org/10.1609/aaai.v38i15.29621.

Full text
Abstract:
The actual collection of tabular data for sharing involves confidentiality and privacy constraints, leaving the potential risks of machine learning for interventional data analysis unsafely averted. Synthetic data has emerged recently as a privacy-protecting solution to address this challenge. However, existing approaches regard discrete and continuous modal features as separate entities, thus falling short in properly capturing their inherent correlations. In this paper, we propose a novel contrastive learning guided Gaussian Transformer autoencoder, termed GTCoder, to synthesize photo-realis
APA, Harvard, Vancouver, ISO, and other styles
7

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Full text
Abstract:
AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves re
APA, Harvard, Vancouver, ISO, and other styles
8

Yogesh J. Gaikwad. "Stress Detection using Multimodal Representation Learning, Fusion Techniques, and Applications." Journal of Information Systems Engineering and Management 10, no. 16s (2025): 245–70. https://doi.org/10.52783/jisem.v10i16s.2593.

Full text
Abstract:
The fields of speech recognition, image identification, and natural language processing have undergone a paradigm shift with the advent of machine learning and deep learning approaches. Although these tasks rely primarily on a single modality for input signals, the artificial intelligence field has various applications that necessitate the use of several modalities. In recent years, academics have placed a growing emphasis on the intricate topic of modelling and learning across various modalities. This has attracted the interest of the scientific community. This technical article provides a co
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Kaifan, Lihuo He, Xin Jiang, Wen Lu, Di Wang, and Xinbo Gao. "CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 13 (2025): 14486–93. https://doi.org/10.1609/aaai.v39i13.33587.

Full text
Abstract:
Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable "beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address the limitation, this paper proposes a unified framework that fully leverages multimodal data to represent EEG signals, named CognitionCapturer. Specifica
APA, Harvard, Vancouver, ISO, and other styles
10

Subbotin, Sergey А., and Fedir A. Shmalko. "Partitioning the data space before applying hashingusing clustering algorithms." Herald of Advanced Information Technology 8, no. 1 (2025): 28–42. https://doi.org/10.15276/hait.8.2025.2.

Full text
Abstract:
This research presents a locality-sensitive hashing framework that enhances approximate nearest neighbor search efficiency by integrating adaptive encoding trees and BERT-based clusterization. The proposed method optimizes data space partitioning before applying hashing, improving retrieval accuracy while reducing computational complexity. First, multimodal data, such as images and textual descriptions, are transformed into a unified semantic space using pre-trained bidirectional encoder representations from transformers embeddings. this ensures cross-modal consistency and facilitates high-dim
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Multimodal embedding space"

1

Couairon, Guillaume. "Text-Based Semantic Image Editing." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.

Full text
Abstract:
L’objectif de cette thèse est de proposer des algorithmes pour la tâche d’édition d’images basée sur le texte (TIE), qui consiste à éditer des images numériques selon une instruction formulée en langage naturel. Par exemple, étant donné une image d’un chien et la requête "Changez le chien en un chat", nous voulons produire une nouvelle image où le chien a été remplacé par un chat, en gardant tous les autres aspects de l’image inchangés (couleur et pose de l’animal, arrière- plan). L’objectif de l’étoile du nord est de permettre à tout un chacun de modifier ses images en util
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Multimodal embedding space"

1

Hamara, Andrew, and Pablo Rivas. "From Latent to Engine Manifolds: Analyzing ImageBind’s Multimodal Embedding Space." In Communications in Computer and Information Science. Springer Nature Switzerland, 2025. https://doi.org/10.1007/978-3-031-86623-4_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Chao, and Jiawei Han. "Data Mining and Knowledge Discovery." In Urban Informatics. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-15-8983-6_42.

Full text
Abstract:
AbstractOur physical world is being projected into online cyberspace at an unprecedented rate. People nowadays visit different places and leave behind them million-scale digital traces such as tweets, check-ins, Yelp reviews, and Uber trajectories. Such digital data are a result of social sensing: namely people act as human sensors that probe different places in the physical world and share their activities online. The availability of massive social-sensing data provides a unique opportunity for understanding urban space in a data-driven manner and improving many urban computing applications,
APA, Harvard, Vancouver, ISO, and other styles
3

Zhao, Xiang, Weixin Zeng, and Jiuyang Tang. "Multimodal Entity Alignment." In Entity Alignment. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4250-3_9.

Full text
Abstract:
AbstractIn various tasks related to artificial intelligence, data is often present in multiple forms or modalities. Recently, it has become a popular approach to combine these different forms of information into a knowledge graph, creating a multi-modal knowledge graph (MMKG). However, multi-modal knowledge graphs (MMKGs) often face issues of insufficient data coverage and incompleteness. In order to address this issue, a possible strategy is to incorporate supplemental information from other multi-modal knowledge graphs (MMKGs). To achieve this goal, current methods for aligning entities coul
APA, Harvard, Vancouver, ISO, and other styles
4

Valles-Perez, Ivan, Grzegorz Beringer, Piotr Bilinski, Gary Cook, and Roberto Barra-Chicote. "SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2023. http://dx.doi.org/10.3233/faia230540.

Full text
Abstract:
Numerous examples in the literature proved that deep learning models have the ability to work well with multimodal data. Recently, CLIP has enabled deep learning systems to learn shared latent spaces between images and text descriptions, with outstanding zero- or few-shot results in downstream tasks. In this paper we explore the same idea proposed by CLIP but applied to the speech domain, where the phonetic and acoustic spaces usually coexist. We train a CLIP-based model with the aim to learn shared representations of phonetic and acoustic spaces. The results show that the proposed model is se
APA, Harvard, Vancouver, ISO, and other styles
5

Goyal, Nishant, Aarul Kumar, Aarushi Chaddha, and D. Lakshmi. "Recent Trends on Artificial Intelligence in Automated Hate Speech Detection." In Advances in Social Networking and Online Communities. IGI Global, 2025. https://doi.org/10.4018/979-8-3693-9904-0.ch014.

Full text
Abstract:
This study investigates the performance of AI in detecting HS in diverse cultural and contextual settings. Existing AI models, trained primarily on English datasets, struggle with regional dialects, idiomatic phrases, and cultural nuances. A systematic review of NLP techniques, including traditional methods (n-grams, Bag of Words) and advanced architectures (BERT, GPT, RoBERTa, CNNs, LSTMs), evaluates their effectiveness. Multilingual models like mBERT and XLM-R are assessed for low-resource scenarios while emerging trends like multimodal learning (CLIP) and adversarial training (GANs) are exp
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multimodal embedding space"

1

He, Yufei, Yuan Sui, Xiaoxin He, Yue Liu, Yifei Sun, and Bryan Hooi. "UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs." In WWW '25: The ACM Web Conference 2025. ACM, 2025. https://doi.org/10.1145/3696410.3714818.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bhattacharya, Indrani, Arkabandhu Chowdhury, and Vikas C. Raykar. "Multimodal Dialog for Browsing Large Visual Catalogs using Exploration-Exploitation Paradigm in a Joint Embedding Space." In ICMR '19: International Conference on Multimedia Retrieval. ACM, 2019. http://dx.doi.org/10.1145/3323873.3325036.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Rostami, Mohammad, and Aram Galstyan. "Cognitively Inspired Learning of Incremental Drifting Concepts." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/341.

Full text
Abstract:
Humans continually expand their learned knowledge to new domains and learn new concepts without any interference with past learned experiences. In contrast, machine learning models perform poorly in a continual learning setting, where input data distribution changes over time. Inspired by the nervous system learning mechanisms, we develop a computational model that enables a deep neural network to learn new concepts and expand its learned knowledge to new domains incrementally in a continual learning setting. We rely on the Parallel Distributed Processing theory to encode abstract concepts in
APA, Harvard, Vancouver, ISO, and other styles
4

Gong, Tiantian, Junsheng Wang, and Liyan Zhang. "Enhancing Cross-modal Completion and Alignment for Unsupervised Incomplete Text-to-Image Person Retrieval." In Thirty-Third International Joint Conference on Artificial Intelligence {IJCAI-24}. International Joint Conferences on Artificial Intelligence Organization, 2024. http://dx.doi.org/10.24963/ijcai.2024/88.

Full text
Abstract:
Traditional text-image person retrieval methods heavily rely on fully matched and identity-annotated multimodal data, representing an ideal yet limited scenario. The issues of handling incomplete multimodal data and the complexities of labeling multimodal data are common challenges encountered in real-world applications. In response to these challenges encountered, we consider a more robust and pragmatic setting termed unsupervised incomplete text-image person retrieval, where person images and text descriptions are not fully matched and lack the supervision of identity labels. To tackle these
APA, Harvard, Vancouver, ISO, and other styles
5

Gopalakrishnan, Sabarish, Premkumar Udaiyar, Shagan Sah, and Raymond Ptucha. "Multi Stage Common Vector Space for Multimodal Embeddings." In 2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR). IEEE, 2019. http://dx.doi.org/10.1109/aipr47015.2019.9174583.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kowalczyk, Mateusz, Karolina Seweryn, Joanna Kolodziej, and Mateusz Krzyszton. "Adversarial Robustness Of Multimodal Machine Learning Models." In 39th ECMS International Conference on Modelling and Simulation. ECMS, 2025. https://doi.org/10.7148/2025-0248.

Full text
Abstract:
The widespread adoption of machine learning, particularly generative models, has revolutionized productivity and capabilities. However, this progress comes with significant security risks, as adversarial attacks can subtly manipulate model predictions through imperceptible perturbations. Multimodal machine learning models, integrating data from modalities such as images and text, further expand the attack surface by enabling cross-modal exploitation. This paper examines the adversarial robustness of multimodal AI systems, focusing on text and image modalities. We analyze security challenges ar
APA, Harvard, Vancouver, ISO, and other styles
7

Elberg, Rafael, Denis Parra, and Mircea Petrache. "Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning." In LatinX in AI at Computer Vision and Pattern Recognition Conference 2024. Journal of LatinX in AI Research, 2024. http://dx.doi.org/10.52591/lxai202406174.

Full text
Abstract:
Image and multimodal machine learning tasks are very challenging to solve in the case of poorly distributed data. In particular, data availability and privacy restrictions exacerbate these hurdles in the medical domain. The state of the art in image generation quality is held by Latent Diffusion models, making them prime candidates for tackling this problem. However, a few key issues still need to be solved, such as the difficulty in generating data from under-represented classes and a slow inference process. To mitigate these issues, we propose a new method for image augmentation in long-tail
APA, Harvard, Vancouver, ISO, and other styles
8

Feng, LiWei, Hao Ai, and Yuan Li. "Multimode Process Monitoring Based on Density Space Clustering Locally Linear Embedding Technique." In 2023 2nd Conference on Fully Actuated System Theory and Applications (CFASTA). IEEE, 2023. http://dx.doi.org/10.1109/cfasta57821.2023.10243375.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Grouwels, Joris, Nicolas Jonason, and Bob L. T. Sturm. "Exploring the Expressive Space of an Articulatory Vocal Modal using Quality-Diversity Optimization with Multimodal Embeddings." In GECCO '25: Genetic and Evolutionary Computation Conference. ACM, 2025. https://doi.org/10.1145/3712256.3726313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Pasi, Piyush Singh, Karthikeya Battepati, Preethi Jyothi, Ganesh Ramakrishnan, Tanmay Mahapatra, and Manoj Singh. "Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/683.

Full text
Abstract:
The problem of audio-to-text alignment has seen significant amount of research using complete supervision during training. However, this is typically not in the context of long audio recordings wherein the text being queried does not appear verbatim within the audio file. This work is a collaboration with a non-governmental organization called CARE India that collects long audio health surveys from young mothers residing in rural parts of Bihar, India. Given a question drawn from a questionnaire that is used to guide these surveys, we aim to locate where the question is asked within a long aud
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!