Academic literature on the topic 'Multimodal embedding and retrieval'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multimodal embedding and retrieval.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Multimodal embedding and retrieval"
Kim, Donghyun, Kuniaki Saito, Kate Saenko, Stan Sclaroff, and Bryan Plummer. "MULE: Multimodal Universal Language Embedding." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11254–61. http://dx.doi.org/10.1609/aaai.v34i07.6785.
Full textKim, Jongseok, Youngjae Yu, Hoeseong Kim, and Gunhee Kim. "Dual Compositional Learning in Interactive Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1771–79. http://dx.doi.org/10.1609/aaai.v35i2.16271.
Full textWang, Di, Xinbo Gao, Xiumei Wang, Lihuo He, and Bo Yuan. "Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval." IEEE Transactions on Image Processing 25, no. 10 (October 2016): 4540–54. http://dx.doi.org/10.1109/tip.2016.2592800.
Full textMerkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (July 2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.
Full textOta, Kosuke, Keiichiro Shirai, Hidetoshi Miyao, and Minoru Maruyama. "Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings." Journal of Advanced Computational Intelligence and Intelligent Informatics 26, no. 6 (November 20, 2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.
Full textQi, Jidong. "Neurophysiological and psychophysical references for trends in supervised VQA multimodal deep learning: An interdisciplinary meta-analysis." Applied and Computational Engineering 30, no. 1 (January 22, 2024): 189–201. http://dx.doi.org/10.54254/2755-2721/30/20230096.
Full textLin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.
Full textMithun, Niluthpol C., Juncheng Li, Florian Metze, and Amit K. Roy-Chowdhury. "Joint embeddings with multimodal cues for video-text retrieval." International Journal of Multimedia Information Retrieval 8, no. 1 (January 12, 2019): 3–18. http://dx.doi.org/10.1007/s13735-018-00166-3.
Full textYang, Bang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, and Yuexian Zou. "Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 6 (March 24, 2024): 6458–66. http://dx.doi.org/10.1609/aaai.v38i6.28466.
Full textXu, Tong, Peilun Zhou, Linkang Hu, Xiangnan He, Yao Hu, and Enhong Chen. "Socializing the Videos: A Multimodal Approach for Social Relation Recognition." ACM Transactions on Multimedia Computing, Communications, and Applications 17, no. 1 (April 16, 2021): 1–23. http://dx.doi.org/10.1145/3416493.
Full textDissertations / Theses on the topic "Multimodal embedding and retrieval"
Rubio, Romano Antonio. "Fashion discovery : a computer vision approach." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.
Full textLa interpretación semántica de imágenes del mundo de la moda es sin duda uno de los dominios más desafiantes para la visión por computador. Leves variaciones en color y forma pueden conferir significados o interpretaciones distintas a una imagen. Es un dominio estrechamente ligado a la comprensión humana subjetiva, pero también a la interpretación y reconocimiento de escenarios y contextos. Ser capaz de extraer información específica sobre moda de imágenes e interpretarla de manera correcta puede ser útil en muchas situaciones y puede ayudar a entender la información subyacente en una imagen. Además, la moda es uno de los negocios más importantes a nivel global, con un valor estimado de tres trillones de dólares y un mercado online en constante crecimiento, lo cual aumenta el interés de los algoritmos basados en imágenes para buscar, clasificar o recomendar prendas. Esta tesis doctoral pretende resolver problemas específicos relacionados con el tratamiento de datos de tiendas virtuales de moda, yendo desde la información más básica a nivel de píxel hasta un entendimiento más abstracto que permita extraer conclusiones sobre las prendas presentes en una imagen, aprovechando para ello la Multi-modalidad de los datos disponibles para desarrollar algunas de las soluciones. Las contribuciones incluyen: - Un nuevo método de extracción de superpíxeles enfocado a mejorar el proceso de anotación de imágenes de moda. - La construcción de un espacio común para representar imágenes y textos referentes a moda. - La aplicación de ese espacio en la tarea de identificar el producto principal dentro de una imagen que muestra un conjunto de prendas. En resumen, la moda es un dominio complejo a muchos niveles en términos de visión por computador y aprendizaje automático, y desarrollar algoritmos específicos capaces de capturar la información esencial a partir de imágenes y textos no es una tarea trivial. Con el fin de resolver algunos de los desafíos que esta plantea, y considerando que este es un doctorado industrial, contribuimos al tema con una variedad de soluciones que pueden mejorar el rendimiento de muchas tareas extremadamente útiles para la industria de la moda online
Automàtica, robòtica i visió
Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.
Full textNowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric
Adebayo, Kolawole John <1986>. "Multimodal Legal Information Retrieval." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amsdottorato.unibo.it/8634/1/ADEBAYO-JOHN-tesi.pdf.
Full textChen, Jianan. "Deep Learning Based Multimodal Retrieval." Electronic Thesis or Diss., Rennes, INSA, 2023. http://www.theses.fr/2023ISAR0019.
Full textMultimodal tasks play a crucial role in the progression towards achieving general artificial intelligence (AI). The primary goal of multimodal retrieval is to employ machine learning algorithms to extract relevant semantic information, bridging the gap between different modalities such as visual images, linguistic text, and other data sources. It is worth noting that the information entropy associated with heterogeneous data for the same high-level semantics varies significantly, posing a significant challenge for multimodal models. Deep learning-based multimodal network models provide an effective solution to tackle the difficulties arising from substantial differences in information entropy. These models exhibit impressive accuracy and stability in large-scale cross-modal information matching tasks, such as image-text retrieval. Furthermore, they demonstrate strong transfer learning capabilities, enabling a well-trained model from one multimodal task to be fine-tuned and applied to a new multimodal task, even in scenarios involving few-shot or zero-shot learning. In our research, we develop a novel generative multimodal multi-view database specifically designed for the multimodal referential segmentation task. Additionally, we establish a state-of-the-art (SOTA) benchmark and multi-view metric for referring expression segmentation models in the multimodal domain. The results of our comparative experiments are presented visually, providing clear and comprehensive insights
Böckmann, Christine, Jens Biele, Roland Neuber, and Jenny Niebsch. "Retrieval of multimodal aerosol size distribution by inversion of multiwavelength data." Universität Potsdam, 1997. http://opus.kobv.de/ubp/volltexte/2007/1436/.
Full textZhu, Meng. "Cross-modal semantic-associative labelling, indexing and retrieval of multimodal data." Thesis, University of Reading, 2010. http://centaur.reading.ac.uk/24828/.
Full textKahn, Itamar. "Remembering the past : multimodal imaging of cortical contributions to episodic retrieval." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/33171.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references.
What is the nature of the neural processes that allow humans to remember past events? The theoretical framework adopted in this thesis builds upon cognitive models that suggest that episodic retrieval can be decomposed into two classes of computations: (1) recovery processes that serve to reactivate stored memories, making information from a past episode readily available, and (2) control processes that serve to guide the retrieval attempt and monitor/evaluate information arising from the recovery processes. A multimodal imaging approach that combined fMRI and MEG was adopted to gain insight into the spatial and temporal brain mechanisms supporting episodic retrieval. Chapter 1 reviews major findings and theories in the episodic retrieval literature grounding the open questions and controversies within the suggested framework. Chapter 2 describes an fMRI and MEG experiment that identified medial temporal cortical structures that signal item memory strength, thus supporting the perception of item familiarity. Chapter 3 describes an fMRI experiment that demonstrated that retrieval of contextual details involves reactivation of neural patterns engaged at encoding.
(cont.) Further, leveraging this pattern of reactivation, it was demonstrated that false recognition may be accompanied by recollection. The fMRI experiment reported in Chapter 3, when combined with an MEG experiment reported in Chapter 4, directly addressed questions regarding the control processes engaged during episodic retrieval. In particular, Chapter 3 showed that parietal and prefrontal cortices contribute to controlling the act of arriving at a retrieval decision. Chapter 4 then illuminates the temporal characteristics of parietal activation during episodic retrieval, providing novel evidence about the nature of parietal responses and thus constraints on theories of parietal involvement in episodic retrieval. The conducted research targeted distinct aspects of the multi-faceted act of remembering the past. The obtained data contribute to the building of an anatomical and temporal "blueprint" documenting the cascade of neural events that unfold during attempts to remember, as well as when such attempts are met with success or lead to memory errors. In the course of framing this research within the context of cognitive models of retrieval, the obtained neural data reflect back on and constrain these theories of remembering.
by Itamar Kahn.
Ph.D.
Nag, Chowdhury Sreyasi [Verfasser]. "Text-image synergy for multimodal retrieval and annotation / Sreyasi Nag Chowdhury." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2021. http://d-nb.info/1240674139/34.
Full textLuqman, Muhammad Muzzamil. "Fuzzy multilevel graph embedding for recognition, indexing and retrieval of graphic document images." Thesis, Tours, 2012. http://www.theses.fr/2012TOUR4005/document.
Full textThis thesis addresses the problem of lack of efficient computational tools for graph based structural pattern recognition approaches and proposes to exploit computational strength of statistical pattern recognition. It has two fold contributions. The first contribution is a new method of explicit graph embedding. The proposed graph embedding method exploits multilevel analysis of graph for extracting graph level information, structural level information and elementary level information from graphs. It embeds this information into a numeric feature vector. The method employs fuzzy overlapping trapezoidal intervals for addressing the noise sensitivity of graph representations and for minimizing the information loss while mapping from continuous graph space to discrete vector space. The method has unsupervised learning abilities and is capable of automatically adapting its parameters to underlying graph dataset. The second contribution is a framework for automatic indexing of graph repositories for graph retrieval and subgraph spotting. This framework exploits explicit graph embedding for representing the cliques of order 2 by numeric feature vectors, together with classification and clustering tools for automatically indexing a graph repository. It does not require a labeled learning set and can be easily deployed to a range of application domains, offering ease of query by example (QBE) and granularity of focused retrieval
Lolich, María, and Susana Azzollini. "Phenomenological retrieval style of autobiographical memories in a sample of major depressed individuals." Pontificia Universidad Católica del Perú, 2016. http://repositorio.pucp.edu.pe/index/handle/123456789/99894.
Full textLa evocación de recuerdos autobiográficos se caracteriza por presentar distintos compo nentes fenomenológicos. Dada la ausencia de trabajos previos realizados en poblaciones hispanoparlantes, se realizaron 34 entrevistas en profundidad a individuos con y sin tras torno depresivo mayor de la ciudad de Buenos Aires (Argentina). Fueron explorados los componentes fenomenológicos presentes en la evocación de recuerdos autobiográficos significativos. Los datos fueron analizados cualitativamente por medio de la Teoría Fun damentada en los Hechos. Durante el análisis descriptivo, se detectaron siete categorías fenomenológicas emergentes del discurso. Del análisis axial y selectivo fueron identificados dos ejes discursivos: retórico-proposicional y especificidad-generalidad. Las implicancias, en la regulación afectiva, derivadas de la asunción de un estilo amodal o multimodal de proce samiento de información autobiográfica merecen mayor atención.
A evocação de memórias autobiográficas é caracterizada por diferentes componentes feno menológicos. Dada a falta de trabalhos prévios sobre o tema em populações de língua espanhola, 34 entrevistas em profundidade foram conduzidas em indivíduos com e sem transtorno depressivo maior na cidade de Buenos Aires (Argentina). Foram explorados os componentes fenomenológicos presentes na evocação de memórias autobiográficas signi ficativas. Os dados foram analisados qualitativamente através da Teoria Fundamentada. Durante a análise descritiva, foram detectadas sete categorias fenomenológicas emer gentes no discurso. Dos analises axial e seletivo foram identificados dois eixos discursivos: retórico-proposicional e especificidade-generalidade. As implicações, na regulação afetiva, decorrentes da assunção de um estilo amodal ou um estilo multimodal no processamento de informações autobiográficas merecem mais atenção.
Books on the topic "Multimodal embedding and retrieval"
Müller, Henning, Oscar Alfonso Jimenez del Toro, Allan Hanbury, Georg Langs, and Antonio Foncubierta Rodriguez, eds. Multimodal Retrieval in the Medical Domain. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-24471-6.
Full textPeters, Carol, Valentin Jijkoun, Thomas Mandl, Henning Müller, Douglas W. Oard, Anselmo Peñas, Vivien Petras, and Diana Santos, eds. Advances in Multilingual and Multimodal Information Retrieval. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-85760-0.
Full textJay, Kuo C. C., ed. Video content analysis using multimodal information: For movie content extraction, indexing, and representation. Boston, Mass: Kluwer Academic Publishers, 2003.
Find full textLi, Ying. Video Content Analysis Using Multimodal Information: For Movie Content Extraction, Indexing and Representation. Boston, MA: Springer US, 2003.
Find full textC, Peters, ed. Advances in multilingual and multimodal information retrieval: 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, September 19-21, 2007 : revised selected papers. Berlin: Springer, 2008.
Find full textForner, Pamela. Multilingual and Multimodal Information Access Evaluation: Second International Conference of the Cross-Language Evaluation Forum, CLEF 2011, Amsterdam, The Netherlands, September 19-22, 2011. Proceedings. Berlin, Heidelberg: Springer-Verlag GmbH Berlin Heidelberg, 2011.
Find full textLi, Ying. Video content analysis using multimodal information: For movie content extraction, indexing, and representation. Boston, MA: Kluwer Academic Publishers, 2003.
Find full textEsposito, Anna. Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues: Third COST 2102 International Training School, Caserta, Italy, March 15-19, 2010, Revised Selected Papers. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011.
Find full textGosse, Bouma, and SpringerLink (Online service), eds. Interactive Multi-modal Question-Answering. Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg, 2011.
Find full textAndrzej, Drygajlo, Esposito Anna, Ortega-Garcia Javier, Faúndez Zanuy Marcos, and SpringerLink (Online service), eds. Biometric ID Management and Multimodal Communication: Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid, Spain, September 16-18, 2009. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009.
Find full textBook chapters on the topic "Multimodal embedding and retrieval"
Zhou, Liting, and Cathal Gurrin. "Multimodal Embedding for Lifelog Retrieval." In MultiMedia Modeling, 416–27. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98358-1_33.
Full textMihajlović, Vojkan, Milan Petković, Willem Jonker, and Henk Blanken. "Multimodal Content-based Video Retrieval." In Multimedia Retrieval, 271–94. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. http://dx.doi.org/10.1007/978-3-540-72895-5_10.
Full textKitanovski, Ivan, Katarina Trojacanec, Ivica Dimitrovski, and Suzana Loskovska. "Multimodal Medical Image Retrieval." In ICT Innovations 2012, 81–89. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37169-1_8.
Full textPegia, Maria, Björn Þór Jónsson, Anastasia Moumtzidou, Sotiris Diplaris, Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris. "Multimodal 3D Object Retrieval." In MultiMedia Modeling, 188–201. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-53302-0_14.
Full textZhang, Xia, Weizheng Chen, and Hongfei Yan. "TLINE: Scalable Transductive Network Embedding." In Information Retrieval Technology, 98–110. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-48051-0_8.
Full textSchedl, Markus, and Peter Knees. "Personalization in Multimodal Music Retrieval." In Adaptive Multimedia Retrieval. Large-Scale Multimedia Retrieval and Evaluation, 58–71. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37425-8_5.
Full textAbdulahhad, Karam. "Concept Embedding for Information Retrieval." In Lecture Notes in Computer Science, 563–69. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-76941-7_45.
Full textGerritse, Emma J., Faegheh Hasibi, and Arjen P. de Vries. "Graph-Embedding Empowered Entity Retrieval." In Lecture Notes in Computer Science, 97–110. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-45439-5_7.
Full textToselli, Alejandro Héctor, Enrique Vidal, and Francisco Casacuberta. "Interactive Image Retrieval." In Multimodal Interactive Pattern Recognition and Applications, 209–26. London: Springer London, 2011. http://dx.doi.org/10.1007/978-0-85729-479-1_11.
Full textVu, Dang-Thinh, and Jason J. Jung. "Detecting Emerging Rumors by Embedding Propagation Graphs." In Information Retrieval Technology, 173–84. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-42835-8_15.
Full textConference papers on the topic "Multimodal embedding and retrieval"
Couairon, Guillaume, Matthijs Douze, Matthieu Cord, and Holger Schwenk. "Embedding Arithmetic of Multimodal Queries for Image Retrieval." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2022. http://dx.doi.org/10.1109/cvprw56347.2022.00542.
Full textHuang, Feiran, Xiaoming Zhang, Chaozhuo Li, Zhoujun Li, Yueying He, and Zhonghua Zhao. "Multimodal Network Embedding via Attention based Multi-view Variational Autoencoder." In ICMR '18: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3206025.3206035.
Full textMithun, Niluthpol Chowdhury, Juncheng Li, Florian Metze, and Amit K. Roy-Chowdhury. "Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval." In ICMR '18: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3206025.3206064.
Full textHuang, Fei, Yong Cheng, Cheng Jin, Yuejie Zhang, and Tao Zhang. "Deep Multimodal Embedding Model for Fine-grained Sketch-based Image Retrieval." In SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3077136.3080681.
Full textBhattacharya, Indrani, Arkabandhu Chowdhury, and Vikas C. Raykar. "Multimodal Dialog for Browsing Large Visual Catalogs using Exploration-Exploitation Paradigm in a Joint Embedding Space." In ICMR '19: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3323873.3325036.
Full textNeculai, Andrei, Yanbei Chen, and Zeynep Akata. "Probabilistic Compositional Embeddings for Multimodal Image Retrieval." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2022. http://dx.doi.org/10.1109/cvprw56347.2022.00501.
Full textParida, Kranti Kumar, Neeraj Matiyali, Tanaya Guha, and Gaurav Sharma. "Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zero-shot Classification and Retrieval of Videos." In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2020. http://dx.doi.org/10.1109/wacv45572.2020.9093438.
Full textDadas, Slawomir. "OPI at SemEval-2023 Task 1: Image-Text Embeddings and Multimodal Information Retrieval for Visual Word Sense Disambiguation." In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023). Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.semeval-1.22.
Full textSung, Raymond C. W., James M. Ritchie, Theodore Lim, Aparajithan Sivanathan, and Mike J. Chantler. "The Evaluation of a Virtual-Aided Design Engineering Review (VADER) System for Automated Knowledge Capture and Reuse." In ASME 2013 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2013. http://dx.doi.org/10.1115/detc2013-12030.
Full textSzekely, Eniko, Eric Bruno, and Stephane Marchand-Maillet. "High-Dimensional Multimodal Distribution Embedding." In 2010 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2010. http://dx.doi.org/10.1109/icdmw.2010.194.
Full text