Academic literature on the topic 'Multimodal embedding and retrieval'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multimodal embedding and retrieval.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multimodal embedding and retrieval"

1

Kim, Donghyun, Kuniaki Saito, Kate Saenko, Stan Sclaroff, and Bryan Plummer. "MULE: Multimodal Universal Language Embedding." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11254–61. http://dx.doi.org/10.1609/aaai.v34i07.6785.

Full text
Abstract:
Existing vision-language methods typically support two languages at a time at most. In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages. We accomplish this by learning a single shared Multimodal Universal Language Embedding (MULE) which has been visually-semantically aligned across all languages. Then we learn to relate MULE to visual data as if it were a single language. Our method is not architecture specific, unlike prior work which typically learned separate branches for each language, enabling our approach to easily be adapted to many vision-language methods and tasks. Since MULE learns a single language branch in the multimodal model, we can also scale to support many languages, and languages with fewer annotations can take advantage of the good representation learned from other (more abundant) language data. We demonstrate the effectiveness of our embeddings on the bidirectional image-sentence retrieval task, supporting up to four languages in a single model. In addition, we show that Machine Translation can be used for data augmentation in multilingual learning, which, combined with MULE, improves mean recall by up to 20.2% on a single language compared to prior work, with the most significant gains seen on languages with relatively few annotations. Our code is publicly available1.
APA, Harvard, Vancouver, ISO, and other styles
2

Kim, Jongseok, Youngjae Yu, Hoeseong Kim, and Gunhee Kim. "Dual Compositional Learning in Interactive Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1771–79. http://dx.doi.org/10.1609/aaai.v35i2.16271.

Full text
Abstract:
We present an approach named Dual Composition Network (DCNet) for interactive image retrieval that searches for the best target image for a natural language query and a reference image. To accomplish this task, existing methods have focused on learning a composite representation of the reference image and the text query to be as close to the embedding of the target image as possible. We refer this approach as Composition Network. In this work, we propose to close the loop with Correction Network that models the difference between the reference and target image in the embedding space and matches it with the embedding of the text query. That is, we consider two cyclic directional mappings for triplets of (reference image, text query, target image) by using both Composition Network and Correction Network. We also propose a joint training loss that can further improve the robustness of multimodal representation learning. We evaluate the proposed model on three benchmark datasets for multimodal retrieval: Fashion-IQ, Shoes, and Fashion200K. Our experiments show that our DCNet achieves new state-of-the-art performance on all three datasets, and the addition of Correction Network consistently improves multiple existing methods that are solely based on Composition Network. Moreover, an ensemble of our model won the first place in Fashion-IQ 2020 challenge held in a CVPR 2020 workshop.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Di, Xinbo Gao, Xiumei Wang, Lihuo He, and Bo Yuan. "Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval." IEEE Transactions on Image Processing 25, no. 10 (October 2016): 4540–54. http://dx.doi.org/10.1109/tip.2016.2592800.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (July 2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Full text
Abstract:
AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state of the art on two popular image-caption retrieval benchmark datasets: Microsoft Common Objects in Context (MSCOCO) and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity (STS) benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-the-art results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics.
APA, Harvard, Vancouver, ISO, and other styles
5

Ota, Kosuke, Keiichiro Shirai, Hidetoshi Miyao, and Minoru Maruyama. "Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings." Journal of Advanced Computational Intelligence and Intelligent Informatics 26, no. 6 (November 20, 2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.

Full text
Abstract:
In this work, we study the application of multimodal analogical reasoning to image retrieval. Multimodal analogy questions are given in a form of tuples of words and images, e.g., “cat”:“dog”::[an image of a cat sitting on a bench]:?, to search for an image of a dog sitting on a bench. Retrieving desired images given these tuples can be seen as a task of finding images whose relation between the query image is close to that of query words. One way to achieve the task is building a common vector space that exhibits analogical regularities. To learn such an embedding, we propose a quadruple neural network called multimodal siamese network. The network consists of recurrent neural networks and convolutional neural networks based on the siamese architecture. We also introduce an effective procedure to generate analogy examples from an image-caption dataset for training of our network. In our experiments, we test our model on analogy-based image retrieval tasks. The results show that our method outperforms the previous work in qualitative evaluation.
APA, Harvard, Vancouver, ISO, and other styles
6

Qi, Jidong. "Neurophysiological and psychophysical references for trends in supervised VQA multimodal deep learning: An interdisciplinary meta-analysis." Applied and Computational Engineering 30, no. 1 (January 22, 2024): 189–201. http://dx.doi.org/10.54254/2755-2721/30/20230096.

Full text
Abstract:
Leading trends in multimodal deep learning for visual-question answering include Multimodal joint-embedding model, multimodal attention-based model, and multimodal external knowledge-based model. Several mechanisms and strategies are used in these models, including representation fusion methods, co-attention mechanisms, and knowledge base retrieval mechanisms. While a variety of works have comprehensively reviewed these strategies, a key gap in research is that there is no interdisciplinary analysis that connects these mechanisms with discoveries on human. As discussions of Neuro-AI continues to thrive, it is important to consider synergies among human level investigations and ANNs, specifically for using AI to reproduce higher order cognitive functions such as multisensory integration. Thus, Present meta-analysis aimed at the reviewing and connecting neurophysiological and psychophysical references to trends in VQA multimodal deep learning, focusing on 1) Providing back-up explanations for why several strategies in VQA MMDL leads to performances that are closer to human level and 2) Using VQA MMDL as an example to demonstrate how interdisciplinary perspective may foster the development of human level AI. The result of the meta-analysis builds connections between several sub-fields: Joint embedding mechanisms and SC neurons, multimodal attention mechanism and the retro-cue effect, and external knowledge base and engram mechanisms.
APA, Harvard, Vancouver, ISO, and other styles
7

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Full text
Abstract:
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.
APA, Harvard, Vancouver, ISO, and other styles
8

Mithun, Niluthpol C., Juncheng Li, Florian Metze, and Amit K. Roy-Chowdhury. "Joint embeddings with multimodal cues for video-text retrieval." International Journal of Multimedia Information Retrieval 8, no. 1 (January 12, 2019): 3–18. http://dx.doi.org/10.1007/s13735-018-00166-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yang, Bang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, and Yuexian Zou. "Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 6 (March 24, 2024): 6458–66. http://dx.doi.org/10.1609/aaai.v38i6.28466.

Full text
Abstract:
While vision-language pre-trained models (VL-PTMs) have advanced multimodal research in recent years, their mastery in a few languages like English restricts their applicability in broader communities. To this end, there is an increasing interest in developing multilingual VL models via a joint-learning setup, which, however, could be unrealistic due to expensive costs and data availability. In this work, we propose to extend VL-PTMs' language capacity by continual language learning (CLL), where a model needs to update its linguistic knowledge incrementally without suffering from catastrophic forgetting (CF). We begin our study by introducing a model dubbed CLL-CLIP, which builds upon CLIP, a prevailing VL-PTM that has acquired image-English text alignment. Specifically, CLL-CLIP contains an expandable token embedding layer to handle linguistic differences. It solely trains token embeddings to improve memory stability and is optimized under cross-modal and cross-lingual objectives to learn the alignment between images and multilingual texts. To alleviate CF raised by covariate shift and lexical overlap, we further propose a novel approach that ensures the identical distribution of all token embeddings during initialization and regularizes token embedding learning during training. We construct a CLL benchmark covering 36 languages based on MSCOCO and XM3600 datasets and then evaluate multilingual image-text retrieval performance. Extensive experiments verify the effectiveness of CLL-CLIP and show that our approach can boost CLL-CLIP, e.g., by 6.7% in text-to-image average Recall@1 on XM3600, and improve various state-of-the-art methods consistently. Our code and data are available at https://github.com/yangbang18/CLFM.
APA, Harvard, Vancouver, ISO, and other styles
10

Xu, Tong, Peilun Zhou, Linkang Hu, Xiangnan He, Yao Hu, and Enhong Chen. "Socializing the Videos: A Multimodal Approach for Social Relation Recognition." ACM Transactions on Multimedia Computing, Communications, and Applications 17, no. 1 (April 16, 2021): 1–23. http://dx.doi.org/10.1145/3416493.

Full text
Abstract:
As a crucial task for video analysis, social relation recognition for characters not only provides semantically rich description of video content but also supports intelligent applications, e.g., video retrieval and visual question answering. Unfortunately, due to the semantic gap between visual and semantic features, traditional solutions may fail to reveal the accurate relations among characters. At the same time, the development of social media platforms has now promoted the emergence of crowdsourced comments, which may enhance the recognition task with semantic and descriptive cues. To that end, in this article, we propose a novel multimodal-based solution to deal with the character relation recognition task. Specifically, we capture the target character pairs via a search module and then design a multistream architecture for jointly embedding the visual and textual information, in which feature fusion and attention mechanism are adapted for better integrating the multimodal inputs. Finally, supervised learning is applied to classify character relations. Experiments on real-world data sets validate that our solution outperforms several competitive baselines.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Multimodal embedding and retrieval"

1

Rubio, Romano Antonio. "Fashion discovery : a computer vision approach." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.

Full text
Abstract:
Performing semantic interpretation of fashion images is undeniably one of the most challenging domains for computer vision. Subtle variations in color and shape might confer different meanings or interpretations to an image. Not only is it a domain tightly coupled with human understanding, but also with scene interpretation and context. Being able to extract fashion-specific information from images and interpret that information in a proper manner can be useful in many situations and help understanding the underlying information in an image. Fashion is also one of the most important businesses around the world, with an estimated value of 3 trillion dollars and a constantly growing online market, which increases the utility of image-based algorithms to search, classify or recommend garments. This doctoral thesis aims to solve specific problems related with the treatment of fashion e-commerce data, from low-level pure pixel information to high-level abstract conclusions of the garments appearing in an image, taking advantage of the multi-modality of the available data for developing some of the solutions. The contributions include: - A new superpixel extraction method focused on improving the annotation process for clothing images. - The construction of an image and text embedding for fashion data. - The application of this embedding space to the task of retrieving the main product in an image showing a complete outfit. In summary, fashion is a complex computer vision and machine learning problem at many levels, and developing specific algorithms that are able to capture essential information from pictures and text is not trivial. In order to solve some of the challenges it proposes, and taking into account that this is an Industrial Ph.D., we contribute with a variety of solutions that can boost the performance of many tasks useful for the fashion e-commerce industry.
La interpretación semántica de imágenes del mundo de la moda es sin duda uno de los dominios más desafiantes para la visión por computador. Leves variaciones en color y forma pueden conferir significados o interpretaciones distintas a una imagen. Es un dominio estrechamente ligado a la comprensión humana subjetiva, pero también a la interpretación y reconocimiento de escenarios y contextos. Ser capaz de extraer información específica sobre moda de imágenes e interpretarla de manera correcta puede ser útil en muchas situaciones y puede ayudar a entender la información subyacente en una imagen. Además, la moda es uno de los negocios más importantes a nivel global, con un valor estimado de tres trillones de dólares y un mercado online en constante crecimiento, lo cual aumenta el interés de los algoritmos basados en imágenes para buscar, clasificar o recomendar prendas. Esta tesis doctoral pretende resolver problemas específicos relacionados con el tratamiento de datos de tiendas virtuales de moda, yendo desde la información más básica a nivel de píxel hasta un entendimiento más abstracto que permita extraer conclusiones sobre las prendas presentes en una imagen, aprovechando para ello la Multi-modalidad de los datos disponibles para desarrollar algunas de las soluciones. Las contribuciones incluyen: - Un nuevo método de extracción de superpíxeles enfocado a mejorar el proceso de anotación de imágenes de moda. - La construcción de un espacio común para representar imágenes y textos referentes a moda. - La aplicación de ese espacio en la tarea de identificar el producto principal dentro de una imagen que muestra un conjunto de prendas. En resumen, la moda es un dominio complejo a muchos niveles en términos de visión por computador y aprendizaje automático, y desarrollar algoritmos específicos capaces de capturar la información esencial a partir de imágenes y textos no es una tarea trivial. Con el fin de resolver algunos de los desafíos que esta plantea, y considerando que este es un doctorado industrial, contribuimos al tema con una variedad de soluciones que pueden mejorar el rendimiento de muchas tareas extremadamente útiles para la industria de la moda online
Automàtica, robòtica i visió
APA, Harvard, Vancouver, ISO, and other styles
2

Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.

Full text
Abstract:
De nos jours l’Intelligence artificielle (IA) est omniprésente dans notre société. Le récent développement des méthodes d’apprentissage basé sur les réseaux de neurones profonds aussi appelé “Deep Learning” a permis une nette amélioration des modèles de représentation visuelle et textuelle. Cette thèse aborde la question de l’apprentissage de plongements multimodaux pour représenter conjointement des données visuelles et sémantiques. C’est une problématique centrale dans le contexte actuel de l’IA et du deep learning, qui présente notamment un très fort potentiel pour l’interprétabilité des modèles. Nous explorons dans cette thèse les espaces de représentations conjoints visuels et sémantiques. Nous proposons deux nouveaux modèles permettant de construire de tels espaces. Nous démontrons également leur capacité à localiser des concepts sémantiques dans le domaine visuel. Nous introduisons également une nouvelle méthode permettant d’apprendre une approximation différentiable des fonctions d’évaluation basée sur le rang
Nowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric
APA, Harvard, Vancouver, ISO, and other styles
3

Adebayo, Kolawole John <1986&gt. "Multimodal Legal Information Retrieval." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amsdottorato.unibo.it/8634/1/ADEBAYO-JOHN-tesi.pdf.

Full text
Abstract:
The goal of this thesis is to present a multifaceted way of inducing semantic representation from legal documents as well as accessing information in a precise and timely manner. The thesis explored approaches for semantic information retrieval (IR) in the Legal context with a technique that maps specific parts of a text to the relevant concept. This technique relies on text segments, using the Latent Dirichlet Allocation (LDA), a topic modeling algorithm for performing text segmentation, expanding the concept using some Natural Language Processing techniques, and then associating the text segments to the concepts using a semi-supervised text similarity technique. This solves two problems, i.e., that of user specificity in formulating query, and information overload, for querying a large document collection with a set of concepts is more fine-grained since specific information, rather than full documents is retrieved. The second part of the thesis describes our Neural Network Relevance Model for E-Discovery Information Retrieval. Our algorithm is essentially a feature-rich Ensemble system with different component Neural Networks extracting different relevance signal. This model has been trained and evaluated on the TREC Legal track 2010 data. The performance of our models across board proves that it capture the semantics and relatedness between query and document which is important to the Legal Information Retrieval domain.
APA, Harvard, Vancouver, ISO, and other styles
4

Chen, Jianan. "Deep Learning Based Multimodal Retrieval." Electronic Thesis or Diss., Rennes, INSA, 2023. http://www.theses.fr/2023ISAR0019.

Full text
Abstract:
Les tâches multimodales jouent un rôle crucial dans la progression vers l'atteinte de l'intelligence artificielle (IA) générale. L'objectif principal de la recherche multimodale est d'exploiter des algorithmes d'apprentissage automatique pour extraire des informations sémantiques pertinentes, en comblant le fossé entre différentes modalités telles que les images visuelles, le texte linguistique et d'autres sources de données. Il convient de noter que l'entropie de l'information associée à des données hétérogènes pour des sémantiques de haut niveau identiques varie considérablement, ce qui pose un défi important pour les modèles multimodaux. Les modèles de réseau multimodal basés sur l'apprentissage profond offrent une solution efficace pour relever les difficultés découlant des différences substantielles d'entropie de l’information. Ces modèles présentent une précision et une stabilité impressionnantes dans les tâches d'appariement d'informations multimodales à grande échelle, comme la recherche d'images et de textes. De plus, ils démontrent de solides capacités d'apprentissage par transfert, permettant à un modèle bien entraîné sur une tâche multimodale d'être affiné et appliqué à une nouvelle tâche multimodale. Dans nos recherches, nous développons une nouvelle base de données multimodale et multi-vues générative spécifiquement conçue pour la tâche de segmentation référentielle multimodale. De plus, nous établissons une référence de pointe (SOTA) pour les modèles de segmentation d'expressions référentielles dans le domaine multimodal. Les résultats de nos expériences comparatives sont présentés de manière visuelle, offrant des informations claires et complètes
Multimodal tasks play a crucial role in the progression towards achieving general artificial intelligence (AI). The primary goal of multimodal retrieval is to employ machine learning algorithms to extract relevant semantic information, bridging the gap between different modalities such as visual images, linguistic text, and other data sources. It is worth noting that the information entropy associated with heterogeneous data for the same high-level semantics varies significantly, posing a significant challenge for multimodal models. Deep learning-based multimodal network models provide an effective solution to tackle the difficulties arising from substantial differences in information entropy. These models exhibit impressive accuracy and stability in large-scale cross-modal information matching tasks, such as image-text retrieval. Furthermore, they demonstrate strong transfer learning capabilities, enabling a well-trained model from one multimodal task to be fine-tuned and applied to a new multimodal task, even in scenarios involving few-shot or zero-shot learning. In our research, we develop a novel generative multimodal multi-view database specifically designed for the multimodal referential segmentation task. Additionally, we establish a state-of-the-art (SOTA) benchmark and multi-view metric for referring expression segmentation models in the multimodal domain. The results of our comparative experiments are presented visually, providing clear and comprehensive insights
APA, Harvard, Vancouver, ISO, and other styles
5

Böckmann, Christine, Jens Biele, Roland Neuber, and Jenny Niebsch. "Retrieval of multimodal aerosol size distribution by inversion of multiwavelength data." Universität Potsdam, 1997. http://opus.kobv.de/ubp/volltexte/2007/1436/.

Full text
Abstract:
The ill-posed problem of aerosol size distribution determination from a small number of backscatter and extinction measurements was solved successfully with a mollifier method which is advantageous since the ill-posed part is performed on exactly given quantities, the points r where n(r) is evaluated may be freely selected. A new twodimensional model for the troposphere is proposed.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhu, Meng. "Cross-modal semantic-associative labelling, indexing and retrieval of multimodal data." Thesis, University of Reading, 2010. http://centaur.reading.ac.uk/24828/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kahn, Itamar. "Remembering the past : multimodal imaging of cortical contributions to episodic retrieval." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/33171.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2005.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references.
What is the nature of the neural processes that allow humans to remember past events? The theoretical framework adopted in this thesis builds upon cognitive models that suggest that episodic retrieval can be decomposed into two classes of computations: (1) recovery processes that serve to reactivate stored memories, making information from a past episode readily available, and (2) control processes that serve to guide the retrieval attempt and monitor/evaluate information arising from the recovery processes. A multimodal imaging approach that combined fMRI and MEG was adopted to gain insight into the spatial and temporal brain mechanisms supporting episodic retrieval. Chapter 1 reviews major findings and theories in the episodic retrieval literature grounding the open questions and controversies within the suggested framework. Chapter 2 describes an fMRI and MEG experiment that identified medial temporal cortical structures that signal item memory strength, thus supporting the perception of item familiarity. Chapter 3 describes an fMRI experiment that demonstrated that retrieval of contextual details involves reactivation of neural patterns engaged at encoding.
(cont.) Further, leveraging this pattern of reactivation, it was demonstrated that false recognition may be accompanied by recollection. The fMRI experiment reported in Chapter 3, when combined with an MEG experiment reported in Chapter 4, directly addressed questions regarding the control processes engaged during episodic retrieval. In particular, Chapter 3 showed that parietal and prefrontal cortices contribute to controlling the act of arriving at a retrieval decision. Chapter 4 then illuminates the temporal characteristics of parietal activation during episodic retrieval, providing novel evidence about the nature of parietal responses and thus constraints on theories of parietal involvement in episodic retrieval. The conducted research targeted distinct aspects of the multi-faceted act of remembering the past. The obtained data contribute to the building of an anatomical and temporal "blueprint" documenting the cascade of neural events that unfold during attempts to remember, as well as when such attempts are met with success or lead to memory errors. In the course of framing this research within the context of cognitive models of retrieval, the obtained neural data reflect back on and constrain these theories of remembering.
by Itamar Kahn.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
8

Nag, Chowdhury Sreyasi [Verfasser]. "Text-image synergy for multimodal retrieval and annotation / Sreyasi Nag Chowdhury." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2021. http://d-nb.info/1240674139/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Luqman, Muhammad Muzzamil. "Fuzzy multilevel graph embedding for recognition, indexing and retrieval of graphic document images." Thesis, Tours, 2012. http://www.theses.fr/2012TOUR4005/document.

Full text
Abstract:
Cette thèse aborde le problème du manque de performance des outils exploitant des représentationsà base de graphes en reconnaissance des formes. Nous proposons de contribuer aux nouvellesméthodes proposant de tirer partie, à la fois, de la richesse des méthodes structurelles et de la rapidité des méthodes de reconnaissance de formes statistiques. Deux principales contributions sontprésentées dans ce manuscrit. La première correspond à la proposition d'une nouvelle méthode deprojection explicite de graphes procédant par analyse multi-facettes des graphes. Cette méthodeeffectue une caractérisation des graphes suivant différents niveaux qui correspondent, selon nous,aux point-clés des représentations à base de graphes. Il s'agit de capturer l'information portéepar un graphe au niveau global, au niveau structure et au niveau local ou élémentaire. Ces informationscapturées sont encapsulés dans un vecteur de caractéristiques numériques employantdes histogrammes flous. La méthode proposée utilise, de plus, un mécanisme d'apprentissage nonsupervisée pour adapter automatiquement ses paramètres en fonction de la base de graphes àtraiter sans nécessité de phase d'apprentissage préalable. La deuxième contribution correspondà la mise en place d'une architecture pour l'indexation de masses de graphes afin de permettre,par la suite, la recherche de sous-graphes présents dans cette base. Cette architecture utilise laméthode précédente de projection explicite de graphes appliquée sur toutes les cliques d'ordre 2pouvant être extraites des graphes présents dans la base à indexer afin de pouvoir les classifier.Cette classification permet de constituer l'index qui sert de base à la description des graphes etdonc à leur indexation en ne nécessitant aucune base d'apprentissage pré-étiquetées. La méthodeproposée est applicable à de nombreux domaines, apportant la souplesse d'un système de requêtepar l'exemple et la granularité des techniques d'extraction ciblée (focused retrieval)
This thesis addresses the problem of lack of efficient computational tools for graph based structural pattern recognition approaches and proposes to exploit computational strength of statistical pattern recognition. It has two fold contributions. The first contribution is a new method of explicit graph embedding. The proposed graph embedding method exploits multilevel analysis of graph for extracting graph level information, structural level information and elementary level information from graphs. It embeds this information into a numeric feature vector. The method employs fuzzy overlapping trapezoidal intervals for addressing the noise sensitivity of graph representations and for minimizing the information loss while mapping from continuous graph space to discrete vector space. The method has unsupervised learning abilities and is capable of automatically adapting its parameters to underlying graph dataset. The second contribution is a framework for automatic indexing of graph repositories for graph retrieval and subgraph spotting. This framework exploits explicit graph embedding for representing the cliques of order 2 by numeric feature vectors, together with classification and clustering tools for automatically indexing a graph repository. It does not require a labeled learning set and can be easily deployed to a range of application domains, offering ease of query by example (QBE) and granularity of focused retrieval
APA, Harvard, Vancouver, ISO, and other styles
10

Lolich, María, and Susana Azzollini. "Phenomenological retrieval style of autobiographical memories in a sample of major depressed individuals." Pontificia Universidad Católica del Perú, 2016. http://repositorio.pucp.edu.pe/index/handle/123456789/99894.

Full text
Abstract:
Autobiographical memory retrieval implies different phenomenological features. Given the lack of previous work in Hispanic-speaking populations, 34 in depth interview were carried out in individuals with and without Major Depressive Disorder in Buenos Aires, Argentina. Phenomenological components during the evocation of autobiographical memories were explored. Data was qualitatively analyzed using Grounded Theory. During the descriptive analyses, seven phenomenological categories were detected as emerging from the discourse. The axial and selective analyses revealed two main discursive axles areas; rhetoric-propo­ sitional and specificity- generalized. The impact on affective regulation processes, derived from the assumption of an amodal or multimodal style of processing autobiographical infor­ mation, merits further attention.
La evocación de recuerdos autobiográficos se caracteriza por presentar distintos compo­ nentes fenomenológicos. Dada la ausencia de trabajos previos realizados en poblaciones hispanoparlantes, se realizaron 34 entrevistas en profundidad a individuos con y sin tras­ torno depresivo mayor de la ciudad de Buenos Aires (Argentina). Fueron explorados los componentes fenomenológicos presentes en la evocación de recuerdos autobiográficos significativos. Los datos fueron analizados cualitativamente por medio de la Teoría Fun­ damentada en los Hechos. Durante el análisis descriptivo, se detectaron siete categorías fenomenológicas emergentes del discurso. Del análisis axial y selectivo fueron identificados dos ejes discursivos: retórico-proposicional y especificidad-generalidad. Las implicancias, en la regulación afectiva, derivadas de la asunción de un estilo amodal o multimodal de proce­ samiento de información autobiográfica merecen mayor atención.
A evocação de memórias autobiográficas é caracterizada por diferentes componentes feno­ menológicos. Dada a falta de trabalhos prévios sobre o tema em populações de língua espanhola, 34 entrevistas em profundidade foram conduzidas em indivíduos com e sem transtorno depressivo maior na cidade de Buenos Aires (Argentina). Foram explorados os componentes fenomenológicos presentes na evocação de memórias autobiográficas signi­ ficativas. Os dados foram analisados qualitativamente através da Teoria Fundamentada. Durante a análise descritiva, foram detectadas sete categorias fenomenológicas emer­ gentes no discurso. Dos analises axial e seletivo foram identificados dois eixos discursivos: retórico-proposicional e especificidade-generalidade. As implicações, na regulação afetiva, decorrentes da assunção de um estilo amodal ou um estilo multimodal no processamento de informações autobiográficas merecem mais atenção.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Multimodal embedding and retrieval"

1

Müller, Henning, Oscar Alfonso Jimenez del Toro, Allan Hanbury, Georg Langs, and Antonio Foncubierta Rodriguez, eds. Multimodal Retrieval in the Medical Domain. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-24471-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Peters, Carol, Valentin Jijkoun, Thomas Mandl, Henning Müller, Douglas W. Oard, Anselmo Peñas, Vivien Petras, and Diana Santos, eds. Advances in Multilingual and Multimodal Information Retrieval. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-85760-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jay, Kuo C. C., ed. Video content analysis using multimodal information: For movie content extraction, indexing, and representation. Boston, Mass: Kluwer Academic Publishers, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Ying. Video Content Analysis Using Multimodal Information: For Movie Content Extraction, Indexing and Representation. Boston, MA: Springer US, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

C, Peters, ed. Advances in multilingual and multimodal information retrieval: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, September 19-21, 2007 : revised selected papers. Berlin: Springer, 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Forner, Pamela. Multilingual and Multimodal Information Access Evaluation: Second International Conference of the Cross-Language Evaluation Forum, CLEF 2011, Amsterdam, The Netherlands, September 19-22, 2011. Proceedings. Berlin, Heidelberg: Springer-Verlag GmbH Berlin Heidelberg, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Ying. Video content analysis using multimodal information: For movie content extraction, indexing, and representation. Boston, MA: Kluwer Academic Publishers, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Esposito, Anna. Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues: Third COST 2102 International Training School, Caserta, Italy, March 15-19, 2010, Revised Selected Papers. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Gosse, Bouma, and SpringerLink (Online service), eds. Interactive Multi-modal Question-Answering. Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Andrzej, Drygajlo, Esposito Anna, Ortega-Garcia Javier, Faúndez Zanuy Marcos, and SpringerLink (Online service), eds. Biometric ID Management and Multimodal Communication: Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid, Spain, September 16-18, 2009. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Multimodal embedding and retrieval"

1

Zhou, Liting, and Cathal Gurrin. "Multimodal Embedding for Lifelog Retrieval." In MultiMedia Modeling, 416–27. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98358-1_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Mihajlović, Vojkan, Milan Petković, Willem Jonker, and Henk Blanken. "Multimodal Content-based Video Retrieval." In Multimedia Retrieval, 271–94. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. http://dx.doi.org/10.1007/978-3-540-72895-5_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kitanovski, Ivan, Katarina Trojacanec, Ivica Dimitrovski, and Suzana Loskovska. "Multimodal Medical Image Retrieval." In ICT Innovations 2012, 81–89. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37169-1_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Pegia, Maria, Björn Þór Jónsson, Anastasia Moumtzidou, Sotiris Diplaris, Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris. "Multimodal 3D Object Retrieval." In MultiMedia Modeling, 188–201. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-53302-0_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Xia, Weizheng Chen, and Hongfei Yan. "TLINE: Scalable Transductive Network Embedding." In Information Retrieval Technology, 98–110. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-48051-0_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Schedl, Markus, and Peter Knees. "Personalization in Multimodal Music Retrieval." In Adaptive Multimedia Retrieval. Large-Scale Multimedia Retrieval and Evaluation, 58–71. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37425-8_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Abdulahhad, Karam. "Concept Embedding for Information Retrieval." In Lecture Notes in Computer Science, 563–69. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-76941-7_45.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gerritse, Emma J., Faegheh Hasibi, and Arjen P. de Vries. "Graph-Embedding Empowered Entity Retrieval." In Lecture Notes in Computer Science, 97–110. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-45439-5_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Toselli, Alejandro Héctor, Enrique Vidal, and Francisco Casacuberta. "Interactive Image Retrieval." In Multimodal Interactive Pattern Recognition and Applications, 209–26. London: Springer London, 2011. http://dx.doi.org/10.1007/978-0-85729-479-1_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Vu, Dang-Thinh, and Jason J. Jung. "Detecting Emerging Rumors by Embedding Propagation Graphs." In Information Retrieval Technology, 173–84. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-42835-8_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multimodal embedding and retrieval"

1

Couairon, Guillaume, Matthijs Douze, Matthieu Cord, and Holger Schwenk. "Embedding Arithmetic of Multimodal Queries for Image Retrieval." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2022. http://dx.doi.org/10.1109/cvprw56347.2022.00542.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Huang, Feiran, Xiaoming Zhang, Chaozhuo Li, Zhoujun Li, Yueying He, and Zhonghua Zhao. "Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder." In ICMR '18: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3206025.3206035.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mithun, Niluthpol Chowdhury, Juncheng Li, Florian Metze, and Amit K. Roy-Chowdhury. "Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval." In ICMR '18: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3206025.3206064.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Huang, Fei, Yong Cheng, Cheng Jin, Yuejie Zhang, and Tao Zhang. "Deep Multimodal Embedding Model for Fine-grained Sketch-based Image Retrieval." In SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3077136.3080681.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bhattacharya, Indrani, Arkabandhu Chowdhury, and Vikas C. Raykar. "Multimodal Dialog for Browsing Large Visual Catalogs using Exploration-Exploitation Paradigm in a Joint Embedding Space." In ICMR '19: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3323873.3325036.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Neculai, Andrei, Yanbei Chen, and Zeynep Akata. "Probabilistic Compositional Embeddings for Multimodal Image Retrieval." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2022. http://dx.doi.org/10.1109/cvprw56347.2022.00501.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Parida, Kranti Kumar, Neeraj Matiyali, Tanaya Guha, and Gaurav Sharma. "Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zero-shot Classification and Retrieval of Videos." In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2020. http://dx.doi.org/10.1109/wacv45572.2020.9093438.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Dadas, Slawomir. "OPI at SemEval-2023 Task 1: Image-Text Embeddings and Multimodal Information Retrieval for Visual Word Sense Disambiguation." In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023). Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.semeval-1.22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Sung, Raymond C. W., James M. Ritchie, Theodore Lim, Aparajithan Sivanathan, and Mike J. Chantler. "The Evaluation of a Virtual-Aided Design Engineering Review (VADER) System for Automated Knowledge Capture and Reuse." In ASME 2013 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2013. http://dx.doi.org/10.1115/detc2013-12030.

Full text
Abstract:
Conducting knowledge capture and embedding it into a products’ through lifecycle remains a key issue in engineering industries; particularly with regard to rationale associated knowledge emanating during formal design reviews. Manual, and often interruptive, methods with associated costly overheads, exacerbate the already time consuming process. As well as these disadvantages, manual methods can potentially capture the wrong data due to human error or not fully-capturing all the pertinent information and associated relationships. Consequently, industries are seeking automated engineering knowledge capture and rationale that adds value to product and processes, potentially reaping the benefits of time and cost. Previous work by the authors proved how user-logging in virtual environments aid unobtrusive capture of engineering knowledge and rationale in design tasks. This paper advances the work further through a Virtual Aided Design Engineering Review (VADER) system developed to automatically and unobtrusively capture both multimodal human-computer and human-human interactivity during design reviews via the synchronous time-phased logging of software interactions, product models, audio, video and input devices. By processing the captured data review reports and records can be automatically generated as well as allowing fast knowledge retrieval. The backbone of VADER is a multimodal device and data fusion architecture to capture and synchronise structured and unstructured data in realtime. Visualisation is through a 3D virtual environment. In addition to allowing engineers to visualise and annotate 3D design models, the system provides a timeline interface to search and visualise the captured decisions from a design review. The VADER system has been put through its initial industrial trial and reported herein. Objective and subjective analysis indicate the VADER system is intuitive to use and can lead to savings in both time and cost with regard to project reviews.
APA, Harvard, Vancouver, ISO, and other styles
10

Szekely, Eniko, Eric Bruno, and Stephane Marchand-Maillet. "High-Dimensional Multimodal Distribution Embedding." In 2010 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2010. http://dx.doi.org/10.1109/icdmw.2010.194.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography