Acceder

Bibliografías temáticas / Multimodal embedding space / Artículos de revistas

Siga este enlace para ver otros tipos de publicaciones sobre el tema: Multimodal embedding space.

Artículos de revistas sobre el tema "Multimodal embedding space"

Autor: Grafiati

Publicado: 25 de mayo de 2024

Última modificación: 31 de julio de 2025

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Multimodal embedding space".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev, and Alexander Panchenko. "On Isotropy of Multimodal Embeddings." Information 14, no. 7 (2023): 392. http://dx.doi.org/10.3390/info14070392.

Texto completo

Resumen

Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Mai, Sijie, Haifeng Hu, and Songlong Xing. "Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (2020): 164–72. http://dx.doi.org/10.1609/aaai.v34i01.5347.

Texto completo

Resumen

Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Zhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.

Texto completo

Resumen

Previous work has shown the effectiveness of using event representations for tasks such as script event prediction and stock market prediction. It is however still challenging to learn the subtle semantic differences between events based solely on textual descriptions of events often represented as (subject, predicate, object) triples. As an alternative, images offer a more intuitive way of understanding event semantics. We observe that event described in text and in images show different abstraction levels and therefore should be projected onto heterogeneous embedding spaces, as opposed to wh

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Guo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi, and Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.

Texto completo

Resumen

The multimodal recommendation has gradually become the infrastructure of online media platforms, enabling them to provide personalized service to users through a joint modeling of user historical behaviors (e.g., purchases, clicks) and item various modalities (e.g., visual and textual). The majority of existing studies typically focus on utilizing modal features or modal-related graph structure to learn user local interests. Nevertheless, these approaches encounter two limitations: (1) Shared updates of user ID embeddings result in the consequential coupling between collaboration and multimoda

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Moon, Jucheol, Nhat Anh Le, Nelson Hebert Minaya, and Sang-Il Choi. "Multimodal Few-Shot Learning for Gait Recognition." Applied Sciences 10, no. 21 (2020): 7619. http://dx.doi.org/10.3390/app10217619.

Texto completo

Resumen

A person’s gait is a behavioral trait that is uniquely associated with each individual and can be used to recognize the person. As information about the human gait can be captured by wearable devices, a few studies have led to the proposal of methods to process gait information for identification purposes. Despite recent advances in gait recognition, an open set gait recognition problem presents challenges to current approaches. To address the open set gait recognition problem, a system should be able to deal with unseen subjects who have not included in the training dataset. In this paper, we

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Zhang, Rongchao, Yiwei Lou, Dexuan Xu, Yongzhi Cao, Hanpin Wang, and Yu Huang. "A Learnable Discrete-Prior Fusion Autoencoder with Contrastive Learning for Tabular Data Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (2024): 16803–11. http://dx.doi.org/10.1609/aaai.v38i15.29621.

Texto completo

Resumen

The actual collection of tabular data for sharing involves confidentiality and privacy constraints, leaving the potential risks of machine learning for interventional data analysis unsafely averted. Synthetic data has emerged recently as a privacy-protecting solution to address this challenge. However, existing approaches regard discrete and continuous modal features as separate entities, thus falling short in properly capturing their inherent correlations. In this paper, we propose a novel contrastive learning guided Gaussian Transformer autoencoder, termed GTCoder, to synthesize photo-realis

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Texto completo

Resumen

AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves re

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

Yogesh J. Gaikwad. "Stress Detection using Multimodal Representation Learning, Fusion Techniques, and Applications." Journal of Information Systems Engineering and Management 10, no. 16s (2025): 245–70. https://doi.org/10.52783/jisem.v10i16s.2593.

Texto completo

Resumen

The fields of speech recognition, image identification, and natural language processing have undergone a paradigm shift with the advent of machine learning and deep learning approaches. Although these tasks rely primarily on a single modality for input signals, the artificial intelligence field has various applications that necessitate the use of several modalities. In recent years, academics have placed a growing emphasis on the intricate topic of modelling and learning across various modalities. This has attracted the interest of the scientific community. This technical article provides a co

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Zhang, Kaifan, Lihuo He, Xin Jiang, Wen Lu, Di Wang, and Xinbo Gao. "CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 13 (2025): 14486–93. https://doi.org/10.1609/aaai.v39i13.33587.

Texto completo

Resumen

Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable "beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address the limitation, this paper proposes a unified framework that fully leverages multimodal data to represent EEG signals, named CognitionCapturer. Specifica

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Subbotin, Sergey А., and Fedir A. Shmalko. "Partitioning the data space before applying hashingusing clustering algorithms." Herald of Advanced Information Technology 8, no. 1 (2025): 28–42. https://doi.org/10.15276/hait.8.2025.2.

Texto completo

Resumen

This research presents a locality-sensitive hashing framework that enhances approximate nearest neighbor search efficiency by integrating adaptive encoding trees and BERT-based clusterization. The proposed method optimizes data space partitioning before applying hashing, improving retrieval accuracy while reducing computational complexity. First, multimodal data, such as images and textual descriptions, are transformed into a unified semantic space using pre-trained bidirectional encoder representations from transformers embeddings. this ensures cross-modal consistency and facilitates high-dim

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

Fan, Yunpeng, Wenyou Du, Yingwei Zhang, and Xiaogang Wang. "Fault Detection for Multimodal Process Using Quality-Relevant Kernel Neighborhood Preserving Embedding." Mathematical Problems in Engineering 2015 (2015): 1–15. http://dx.doi.org/10.1155/2015/210125.

Texto completo

Resumen

A new method named quality-relevant kernel neighborhood preserving embedding (QKNPE) has been proposed. Quality variables have been considered for the first time in kernel neighborhood preserving embedding (KNPE) method for monitoring multimodal process. In summary, the whole algorithm is a two-step process: first, to improve manifold structure and to deal with multimodal nonlinearity problem, the neighborhood preserving embedding technique is introduced; and second to monitoring the complete production process, the product quality variables are added in the objective function. Compared with t

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Zhang, Jianqiang, Renyao Chen, Shengwen Li, Tailong Li, and Hong Yao. "MGKGR: Multimodal Semantic Fusion for Geographic Knowledge Graph Representation." Algorithms 17, no. 12 (2024): 593. https://doi.org/10.3390/a17120593.

Texto completo

Resumen

Geographic knowledge graph representation learning embeds entities and relationships in geographic knowledge graphs into a low-dimensional continuous vector space, which serves as a basic method that bridges geographic knowledge graphs and geographic applications. Previous geographic knowledge graph representation methods primarily learn the vectors of entities and their relationships from their spatial attributes and relationships, which ignores various semantics of entities, resulting in poor embeddings on geographic knowledge graphs. This study proposes a two-stage multimodal geographic kno

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Zhang, Sensen, Xun Liang, Simin Niu, et al. "Integrating Large Language Models and Möbius Group Transformations for Temporal Knowledge Graph Embedding on the Riemann Sphere." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 12 (2025): 13277–85. https://doi.org/10.1609/aaai.v39i12.33449.

Texto completo

Resumen

The significance of Temporal Knowledge Graphs (TKGs) in Artificial Intelligence (AI) lies in their capacity to incorporate time-dimensional information, support complex reasoning and prediction, optimize decision-making processes, enhance the accuracy of recommendation systems, promote multimodal data integration, and strengthen knowledge management and updates. This provides a robust foundation for various AI applications. To effectively learn and apply both static and dynamic temporal patterns for reasoning, a range of embedding methods and large language models (LLMs) have been proposed in

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Ota, Kosuke, Keiichiro Shirai, Hidetoshi Miyao, and Minoru Maruyama. "Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings." Journal of Advanced Computational Intelligence and Intelligent Informatics 26, no. 6 (2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.

Texto completo

Resumen

In this work, we study the application of multimodal analogical reasoning to image retrieval. Multimodal analogy questions are given in a form of tuples of words and images, e.g., “cat”:“dog”::[an image of a cat sitting on a bench]:?, to search for an image of a dog sitting on a bench. Retrieving desired images given these tuples can be seen as a task of finding images whose relation between the query image is close to that of query words. One way to achieve the task is building a common vector space that exhibits analogical regularities. To learn such an embedding, we propose a quadruple neur

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Kim, Jongseok, Youngjae Yu, Hoeseong Kim, and Gunhee Kim. "Dual Compositional Learning in Interactive Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (2021): 1771–79. http://dx.doi.org/10.1609/aaai.v35i2.16271.

Texto completo

Resumen

We present an approach named Dual Composition Network (DCNet) for interactive image retrieval that searches for the best target image for a natural language query and a reference image. To accomplish this task, existing methods have focused on learning a composite representation of the reference image and the text query to be as close to the embedding of the target image as possible. We refer this approach as Composition Network. In this work, we propose to close the loop with Correction Network that models the difference between the reference and target image in the embedding space and matche

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Chen, Yatong, Chenzhi Hu, Tomoyoshi Kimura, et al. "SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, no. 4 (2024): 1–30. http://dx.doi.org/10.1145/3699779.

Texto completo

Resumen

This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

He, Yuxuan, Kunda Wang, Qicheng Song, Huixin Li, and Bozhi Zhang. "Specific Emitter Identification Algorithm Based on Time–Frequency Sequence Multimodal Feature Fusion Network." Electronics 13, no. 18 (2024): 3703. http://dx.doi.org/10.3390/electronics13183703.

Texto completo

Resumen

Specific emitter identification is a challenge in the field of radar signal processing. Its aims to extract individual fingerprint features of the signal. However, early works are all designed using either signal or time–frequency image and heavily rely on the calculation of hand-crafted features or complex interactions in high-dimensional feature space. This paper introduces the time–frequency multimodal feature fusion network, a novel architecture based on multimodal feature interaction. Specifically, we designed a time–frequency signal feature encoding module, a wvd image feature encoding m

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Abiyev, Rahib H., Mohamad Ziad Altabel, Manal Darwish, and Abdulkader Helwan. "A Multimodal Transformer Model for Recognition of Images from Complex Laparoscopic Surgical Videos." Diagnostics 14, no. 7 (2024): 681. http://dx.doi.org/10.3390/diagnostics14070681.

Texto completo

Resumen

The determination of the potential role and advantages of artificial intelligence-based models in the field of surgery remains uncertain. This research marks an initial stride towards creating a multimodal model, inspired by the Video-Audio-Text Transformer, that aims to reduce negative occurrences and enhance patient safety. The model employs text and image embedding state-of-the-art models (ViT and BERT) to assess their efficacy in extracting the hidden and distinct features from the surgery video frames. These features are then used as inputs for convolution-free Transformer architectures t

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Tripathi, Aakash Gireesh, Asim Waqas, Yasin Yilmaz, Matthew B. Schabath, and Ghulam Rasool. "Abstract 3641: Predicting treatment outcomes using cross-modality correlations in multimodal oncology data." Cancer Research 85, no. 8_Supplement_1 (2025): 3641. https://doi.org/10.1158/1538-7445.am2025-3641.

Texto completo

Resumen

Abstract Accurate prediction of treatment outcomes in oncology requires modeling the intricate relationships across diverse data modalities. This study investigates cross-modality correlations by leveraging imaging and clinical data curated through the Multimodal Integration of Oncology Data System (MINDS) and HoneyBee frameworks to uncover actionable patterns for personalized treatment strategies. Using data from over 10, 000 cancer patients, we developed a machine learning pipeline that employs advanced embedding techniques to capture associations between radiological imaging phenotypes and

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Skantze, Gabriel, and Bram Willemsen. "CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings." Journal of Artificial Intelligence Research 74 (July 9, 2022): 1201–23. http://dx.doi.org/10.1613/jair.1.13689.

Texto completo

Resumen

This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision. Given a pre-trained multimodal embedding model, where language and images are projected in the same semantic space (in this case CLIP by OpenAI), CoLLIE learns a transformation function that adjusts the language embeddings when needed to accommodate new language use. This is done by predicting the difference vector that needs to be applied, as well as a scaling factor for this vector, so that the adjustment is only applied when needed. Unlike traditional few-shot learning, th

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Zhang, Linhao, Li Jin, Xian Sun, et al. "TOT：Topology-Aware Optimal Transport for Multimodal Hate Detection." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 4 (2023): 4884–92. http://dx.doi.org/10.1609/aaai.v37i4.25614.

Texto completo

Resumen

Multimodal hate detection, which aims to identify the harmful content online such as memes, is crucial for building a wholesome internet environment. Previous work has made enlightening exploration in detecting explicit hate remarks. However, most of their approaches neglect the analysis of implicit harm, which is particularly challenging as explicit text markers and demographic visual cues are often twisted or missing. The leveraged cross-modal attention mechanisms also suffer from the distributional modality gap and lack logical interpretability. To address these semantic gap issues, we prop

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Zhang, Yachao, Runze Hu, Ronghui Li, Yanyun Qu, Yuan Xie, and Xiu Li. "Cross-Modal Match for Language Conditioned 3D Object Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (2024): 7359–67. http://dx.doi.org/10.1609/aaai.v38i7.28566.

Texto completo

Resumen

Language conditioned 3D object grounding aims to find the object within the 3D scene mentioned by natural language descriptions, which mainly depends on the matching between visual and natural language. Considerable improvement in grounding performance is achieved by improving the multimodal fusion mechanism or bridging the gap between detection and matching. However, several mismatches are ignored, i.e., mismatch in local visual representation and global sentence representation, and mismatch in visual space and corresponding label word space. In this paper, we propose crossmodal match for 3D

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Liang, Meiyu, Junping Du, Zhengyang Liang, Yongwang Xing, Wei Huang, and Zhe Xue. "Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 12 (2024): 13744–53. http://dx.doi.org/10.1609/aaai.v38i12.29280.

Texto completo

Resumen

Deep cross-modal hashing technology provides an effective and efficient cross-modal unified representation learning solution for cross-modal search. However, the existing methods neglect the implicit fine-grained multimodal knowledge relations between these modalities such as when the image contains information that is not directly described in the text. To tackle this problem, we propose a novel self-supervised multi-grained multi-modal knowledge graph contrastive hashing method for cross-modal search (CMGCH). Firstly, in order to capture implicit fine-grained cross-modal semantic association

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

Chen, Meng, Kai Zhang, Zhenying He, Yinan Jing, and X. Sean Wang. "RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search." Proceedings of the VLDB Endowment 17, no. 11 (2024): 2735–49. http://dx.doi.org/10.14778/3681954.3681959.

Texto completo

Resumen

Approximate Nearest Neighbor Search (ANNS) is a fundamental and critical component in many applications, including recommendation systems and large language model-based applications. With the advancement of multimodal neural models, which transform data from different modalities into a shared high-dimensional space as feature vectors, cross-modal ANNS aims to use the data vector from one modality (e.g., texts) as the query to retrieve the most similar items from another (e.g., images or videos). However, there is an inherent distribution gap between embeddings from different modalities, and cr

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

Akalya, Devi C., Renuka D. Karthika, T. Harisudhan, V. K. Jeevanantham, J. Jhanani, and Varshini S. Kavi. "Text emotion recognition using fast text word embedding in bi-directional gated recurrent unit." i-manager's Journal on Information Technology 11, no. 4 (2022): 1. http://dx.doi.org/10.26634/jit.11.4.19119.

Texto completo

Resumen

Emotions are states of readiness in the mind that result from evaluations of one's own thinking or events. Although almost all of the important events in our lives are marked by emotions, the nature, causes, and effects of emotions are some of the least understood parts of the human experience. Emotion recognition is playing a promising role in the domains of human-computer interaction and artificial intelligence. A human's emotions can be detected using a variety of methods, including facial gestures, blood pressure, body movements, heart rate, and textual data. From an application standpoint

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Simhal, Anish K., Rena Elkin, Ross S. Firestone, Jung Hun Oh, and Joseph O. Deasy. "Abstract A031: Unsupervised graph-based visualization of variational autoencoder latent spaces reveals hidden multiple myeloma subtypes." Clinical Cancer Research 31, no. 13_Supplement (2025): A031. https://doi.org/10.1158/1557-3265.aimachine-a031.

Texto completo

Resumen

Abstract Latent space representations learned through variational autoencoders (VAEs) offer a powerful, unsupervised means of capturing nonlinear structure in high-dimensional oncology data. The latent embedding spaces often encode information that differs from traditional bioinformatics methods such as t-SNE or UMAP. However, a persistent challenge remains: how to meaningfully visualize and interpret these latent variables. Common dimensionality reduction techniques like UMAP and t-SNE, while effective, can obscure graph-theoretic relationships that may underlie important biological patterns.

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Li, Yihua, Hongyue Chen, Yiqing Li, and Yetong Xin. "PoeSpin: A Human-AI Dance to Poetry System for Movement-Based Verse Generation." Proceedings of the ACM on Computer Graphics and Interactive Techniques 8, no. 3 (2025): 1–13. https://doi.org/10.1145/3736781.

Texto completo

Resumen

This paper presents PoeSpin, a human–AI co-writing system that transforms pole dance movements into poetry. Inspired by pole dance and computational linguistics, this project reimagines dance as a form of embodied poetic creation situated within the traditions of spatial and concrete poetry. Drawing from these traditions, PoeSpin treats physical motion as a generative force in the poetic process. We implemented three movement-to-poetry strategies. Among them, the second approach—embedding pole dance trajectories into a reduced-dimensional semantic space—served as the foundation for a real-time

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Hnini, Ghizlane, Jamal Riffi, Mohamed Adnane Mahraz, Ali Yahyaouy, and Hamid Tairi. "MMPC-RF: A Deep Multimodal Feature-Level Fusion Architecture for Hybrid Spam E-mail Detection." Applied Sciences 11, no. 24 (2021): 11968. http://dx.doi.org/10.3390/app112411968.

Texto completo

Resumen

Hybrid spam is an undesirable e-mail (electronic mail) that contains both image and text parts. It is more harmful and complex as compared to image-based and text-based spam e-mail. Thus, an efficient and intelligent approach is required to distinguish between spam and ham. To our knowledge, a small number of studies have been aimed at detecting hybrid spam e-mails. Most of these multimodal architectures adopted the decision-level fusion method, whereby the classification scores of each modality were concatenated and fed to another classification model to make a final decision. Unfortunately,

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

Wang, Kaijie, Tiejun Wang, Xiaoran Guo, Kui Xu, and Jiao Wu. "Thangka Image—Text Matching Based on Adaptive Pooling Layer and Improved Transformer." Applied Sciences 14, no. 2 (2024): 807. http://dx.doi.org/10.3390/app14020807.

Texto completo

Resumen

Image–text matching is a research hotspot in the multimodal task of integrating image and text processing. In order to solve the difficult problem of associating image and text data in the multimodal knowledge graph of Thangka, we propose an image and text matching method based on the Visual Semantic Embedding (VSE) model. The method introduces an adaptive pooling layer to improve the feature extraction capability of semantic associations between Thangka images and texts. We also improved the traditional Transformer architecture by combining bidirectional residual concatenation and mask attent

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

Zhang, Yutong, Jiantao Wu, Li Sun, and Guoan Yang. "Contrastive Learning-Based Cross-Modal Fusion for Product Form Imagery Recognition: A Case Study on New Energy Vehicle Front-End Design." Sustainability 17, no. 10 (2025): 4432. https://doi.org/10.3390/su17104432.

Texto completo

Resumen

Fine-grained feature extraction and affective semantic mapping remain significant challenges in product form analysis. To address these issues, this study proposes a contrastive learning-based cross-modal fusion approach for product form imagery recognition, using the front-end design of new energy vehicles (NEVs) as a case study. The proposed method first employs the Biterm Topic Model (BTM) and Analytic Hierarchy Process (AHP) to extract thematic patterns and compute weight distributions from consumer review texts, thereby identifying key imagery style labels. These labels are then leveraged

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Biswas, Rajarshi, Michael Barz, and Daniel Sonntag. "Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking." KI - Künstliche Intelligenz 34, no. 4 (2020): 571–84. http://dx.doi.org/10.1007/s13218-020-00679-2.

Texto completo

Resumen

AbstractImage captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level fea

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Meo, Giuseppe, Pilar M. Ferraro, Marta Cillerai, et al. "MND Phenotypes Differentiation: The Role of Multimodal Characterization at the Time of Diagnosis." Life 12, no. 10 (2022): 1506. http://dx.doi.org/10.3390/life12101506.

Texto completo

Resumen

Pure/predominant upper motor neuron (pUMN) and lower motor neuron (pLMN) diseases have significantly better prognosis compared to amyotrophic lateral sclerosis (ALS), but their early differentiation is often challenging. We therefore tested whether a multimodal characterization approach embedding clinical, cognitive/behavioral, genetic, and neurophysiological data may improve the differentiation of pUMN and pLMN from ALS already by the time of diagnosis. Dunn’s and chi-squared tests were used to compare data from 41 ALS, 34 pLMN, and 19 pUMN cases with diagnoses confirmed throughout a 2-year o

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Malitesta, Daniele. "Graph Neural Networks for Recommendation Leveraging Multimodal Information." ACM SIGIR Forum 58, no. 1 (2024): 1–2. http://dx.doi.org/10.1145/3687273.3687295.

Texto completo

Resumen

Recommender systems act as filtering algorithms to provide users with items that might meet their interests according to the expressed preferences and items' characteristics. As of today, the collaborative filtering paradigm, along with deep learning techniques to learn high-quality users' and items' representations, constitute the de facto standard for personalized recommendation, showing remarkable recommendation accuracy performance. Nevertheless, recommendation remains a highly-challenging task. Among the most debated open issues in the community, this thesis considers two algorithmic and

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

Balabin, Helena, Charles Tapley Hoyt, Colin Birkenbihl, et al. "STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs." Bioinformatics 38, no. 6 (2022): 1648–56. http://dx.doi.org/10.1093/bioinformatics/btac001.

Texto completo

Resumen

Abstract Motivation The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models. However, representations based on a single modality are inherently limited. Results To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Grap

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Yuan, Xinpan, Xinxin Mao, Wei Xia, Zhiqi Zhang, Shaojun Xie, and Chengyuan Zhang. "PTF-SimCM: A Simple Contrastive Model with Polysemous Text Fusion for Visual Similarity Metric." Complexity 2022 (September 16, 2022): 1–14. http://dx.doi.org/10.1155/2022/2343707.

Texto completo

Resumen

Image similarity metric, also known as metric learning (ML) in computer vision, is a significant step in various advanced image tasks. Nevertheless, existing well-performing approaches for image similarity measurement only focus on the image itself without utilizing the information of other modalities, while pictures always appear with the described text. Furthermore, those methods need human supervision, yet most images are unlabeled in the real world. Considering the above problems comprehensively, we present a novel visual similarity metric model named PTF-SimCM. It adopts a self-supervised

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

Tang, Zhenchao, Jiehui Huang, Guanxing Chen, and Calvin Yu-Chian Chen. "Comprehensive View Embedding Learning for Single-Cell Multimodal Integration." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15292–300. http://dx.doi.org/10.1609/aaai.v38i14.29453.

Texto completo

Resumen

Motivation: Advances in single-cell measurement techniques provide rich multimodal data, which helps us to explore the life state of cells more deeply. However, multimodal integration, or, learning joint embeddings from multimodal data remains a current challenge. The difficulty in integrating unpaired single-cell multimodal data is that different modalities have different feature spaces, which easily leads to information loss in joint embedding. And few existing methods have fully exploited and fused the information in single-cell multimodal data. Result: In this study, we propose CoVEL, a de

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

Chen, Ziwei, Shaokun An, Xiangqi Bai, Fuzhou Gong, Liang Ma, and Lin Wan. "DensityPath: an algorithm to visualize and reconstruct cell state-transition path on density landscape for single-cell RNA sequencing data." Bioinformatics 35, no. 15 (2018): 2593–601. http://dx.doi.org/10.1093/bioinformatics/bty1009.

Texto completo

Resumen

Abstract Motivation Visualizing and reconstructing cell developmental trajectories intrinsically embedded in high-dimensional expression profiles of single-cell RNA sequencing (scRNA-seq) snapshot data are computationally intriguing, but challenging. Results We propose DensityPath, an algorithm allowing (i) visualization of the intrinsic structure of scRNA-seq data on an embedded 2-d space and (ii) reconstruction of an optimal cell state-transition path on the density landscape. DensityPath powerfully handles high dimensionality and heterogeneity of scRNA-seq data by (i) revealing the intrinsi

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

Yin, Ziyi, Muchao Ye, Tianrong Zhang, et al. "VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (2024): 6755–63. http://dx.doi.org/10.1609/aaai.v38i7.28499.

Texto completo

Resumen

Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. Although the “pre-training & finetuning” learning paradigm significantly improves the VQA performance, the adversarial robustness of such a learning paradigm has not been explored. In this paper, we delve into a new problem: using a pre-trained multimodal source model to create adversarial image-text pairs and then transferring them to attack the target VQA models. Correspondingly, we propose a novel VQATTACK model, which can iteratively generate both im- age and text perturbations

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

Widrich, Michael, Anooj Patel, Peter Ulz, et al. "Abstract A045: Unlocking deep learning for cell-free DNA-based early colorectal cancer detection." Clinical Cancer Research 31, no. 13_Supplement (2025): A045. https://doi.org/10.1158/1557-3265.aimachine-a045.

Texto completo

Resumen

Abstract Introduction: Colorectal cancer (CRC) is the second most common cause of cancer-related death in the US. Screening reduces cancer mortality through early detection, but only 59% of eligible individuals are up to date with recommended CRC screening. Non-invasive and more convenient tests can increase adherence to screening guidelines, and blood tests using next generation sequencing to detect cancer-associated methylation patterns in cell-free DNA (cfDNA) have recently shown great promise and are on track to or have recently achieved FDA approval. Such tests produce data for millions o

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Texto completo

Resumen

Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

Neagu, Maria-Ionela. "INTRODUCTION. MULTIMODAL DIMENSIONS OF IDENTITY IN CULTURE, SOCIETY, AND THE ARTS." JOURNAL OF LINGUISTIC AND INTERCULTURAL EDUCATION 17, no. 2 (2024): 7–13. https://doi.org/10.29302/jolie.2024.17.2.1.

Texto completo

Resumen

The concept of identity has been approached from multiple perspectives, as the self always relates to everything and everybody that surrounds it, getting adjusted by every experience it passes through, in a continuous attempt to gain self-apprehension and to recover its sense of belonging. Place, time, emotions, culture are only a few of the factors that impact upon the self, reconfiguring it, as a result of the “troubled condition of the individual, displaced and oscillating between cultures” (Dobrinescu 2017: 156). It is the quest for personal identity, against the background of social relat

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

Xu, Xing, Jialin Tian, Kaiyi Lin, Huimin Lu, Jie Shao, and Heng Tao Shen. "Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network." ACM Transactions on Multimedia Computing, Communications, and Applications 17, no. 1s (2021): 1–17. http://dx.doi.org/10.1145/3424341.

Texto completo

Resumen

Conventional cross-modal retrieval models mainly assume the same scope of the classes for both the training set and the testing set. This assumption limits their extensibility on zero-shot cross-modal retrieval (ZS-CMR), where the testing set consists of unseen classes that are disjoint with seen classes in the training set. The ZS-CMR task is more challenging due to the heterogeneous distributions of different modalities and the semantic inconsistency between seen and unseen classes. A few of recently proposed approaches are inspired by zero-shot learning to estimate the distribution underlyi

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

Larin, Ilya, and Alexander Karabelsky. "Riemannian Manifolds for Biological Imaging Applications Based on Unsupervised Learning." Journal of Imaging 11, no. 4 (2025): 103. https://doi.org/10.3390/jimaging11040103.

Texto completo

Resumen

The development of neural networks has made the introduction of multimodal systems inevitable. Computer vision methods are still not widely used in biological research, despite their importance. It is time to recognize the significance of advances in feature extraction and real-time analysis of information from cells. Teacherless learning for the image clustering task is of great interest. In particular, the clustering of single cells is of great interest. This study will evaluate the feasibility of using latent representation and clustering of single cells in various applications in the field

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

Rakhi Madhukararao Joshi. "Enhancing Vehicle Tracking and Recognition Across Multiple Cameras with Multimodal Contrastive Domain Sharing GAN and Topological Embeddings." Panamerican Mathematical Journal 34, no. 1 (2024): 114–27. http://dx.doi.org/10.52783/pmj.v34.i1.910.

Texto completo

Resumen

Using Multimodal Contrastive Domain Sharing Generative Adversarial Networks (GAN) and topological embeddings, this study shows a new way to improve car tracking and classification across multiple camera feeds. Different camera angles and lighting conditions can make it hard for current car tracking systems to work correctly. This study tries to solve these problems. Common Objects in Context (COCO) and ImageNet are two datasets that are used in this method for training. Multimodal Contrastive Domain Sharing GAN is used for detection and tracking. It makes cross-modal learning easier by letting

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Vijaya Kamble. "Design of an Iterative Method for Enhanced Multimodal Time Series Analysis Using Graph Attention Networks, Variational Graph Autoencoders, and Transfer Learning." Journal of Electrical Systems 20, no. 5s (2024): 2579–98. http://dx.doi.org/10.52783/jes.2699.

Texto completo

Resumen

In the ever-evolving landscape of data analysis, the need to efficiently and accurately interpret multimodal time series data has become paramount. Traditional methods often fall short in addressing the complex dependencies and dynamics inherent in such data, limiting their effectiveness in real-world applications. This work introduces a comprehensive approach that leverages Graph Attention Networks (GATs), Variational Graph Autoencoders (VGAEs), transfer learning with pretrained transformers, and Bayesian state-space models to overcome these limitations. GATs are selected for their ability to

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Adams, Brittany, Nance S. Wilson, and Gillian E. Mertens. "dmQAR: Mapping Metacognition in Digital Spaces onto Question–Answer Relationship." Education Sciences 15, no. 6 (2025): 751. https://doi.org/10.3390/educsci15060751.

Texto completo

Resumen

This paper proposes the Digital Metacognitive Question–Answer Relationship (dmQAR) Framework, an adaptation of traditional QAR models for the complexities of digital reading environments. In response to the nonlinear, multimodal, and algorithmically curated nature of online texts, the dmQAR Framework scaffolds purposeful metacognitive questioning to support comprehension, evaluation, and critical engagement. Drawing on research in metacognition, critical literacy, and digital reading, the framework reinterprets “Right There,” “Think and Search,” “Author and Me,” and “On My Own” question catego

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

Zhang, Rongchao, Yu Huang, Yiwei Lou, et al. "Exploit Your Latents: Coarse-Grained Protein Backmapping with Latent Diffusion Models." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 1 (2025): 1111–19. https://doi.org/10.1609/aaai.v39i1.32098.

Texto completo

Resumen

Coarse-grained (CG) molecular dynamics of proteins is a preferred approach to studying large molecules on extended time scales by condensing the entire atomic model into a limited number of pseudo-atoms and preserving the thermodynamic properties of the system. However, the significantly increased efficiency impedes the analysis of substantial physicochemical information, since high-resolution atomic details are sacrificed to accelerate simulation. In this paper, we propose LatCPB, a generative approach based on diffusion that enables high-resolution backmapping of CG proteins. Specifically, o

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Han, Kezhen, Shaohang Lu, Zhengce Liu, and Zipeng Wang. "Active Fault Isolation for Multimode Fault Systems Based on a Set Separation Indicator." Entropy 25, no. 6 (2023): 876. http://dx.doi.org/10.3390/e25060876.

Texto completo

Resumen

This paper considers the active fault isolation problem for a class of uncertain multimode fault systems with a high-dimensional state-space model. It has been observed that the existing approaches in the literature based on a steady-state active fault isolation method are often accompanied by a large delay in making the correct isolation decision. To reduce such fault isolation latency significantly, this paper proposes a fast online active fault isolation method based on the construction of residual transient-state reachable set and transient-state separating hyperplane. The novelty and bene

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Weiner, Pascal, Caterina Neef, Yoshihisa Shibata, Yoshihiko Nakamura, and Tamim Asfour. "An Embedded, Multi-Modal Sensor System for Scalable Robotic and Prosthetic Hand Fingers." Sensors 20, no. 1 (2019): 101. http://dx.doi.org/10.3390/s20010101.

Texto completo

Resumen

Grasping and manipulation with anthropomorphic robotic and prosthetic hands presents a scientific challenge regarding mechanical design, sensor system, and control. Apart from the mechanical design of such hands, embedding sensors needed for closed-loop control of grasping tasks remains a hard problem due to limited space and required high level of integration of different components. In this paper we present a scalable design model of artificial fingers, which combines mechanical design and embedded electronics with a sophisticated multi-modal sensor system consisting of sensors for sensing n

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Du, Zhicheng, Hui-Yan Luo, Lijin Lian, Vijay Kumar Pandey, Jiansong Ji, and Peiwu Qin. "Abstract A049: Development of a tri-modal contrast learning model integrating pathology-text and CT-text for clinical oncology tasks." Clinical Cancer Research 31, no. 13_Supplement (2025): A049. https://doi.org/10.1158/1557-3265.aimachine-a049.

Texto completo

Resumen

Abstract This study aims to develop a path-image-text tri-modal representation learning framework (Tri-MCR) without paired data by integrating pathology-text and CT-text models to improve the performance of clinical cancer tasks. Aiming at the challenge of scarce multi-modal data pairing in the cancer field, we project the pre-trained pathology-text and CT-text models into a shared semantic space by using text (i.e., Electronic Health Record) as an intermediate modality and optimize the cross-modal alignment by using a semantic enhancement strategy. Tri-MCR projects pre-trained pathology-text

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!