To see the other types of publications on this topic, follow the link: Multimodal retrieval.

Journal articles on the topic 'Multimodal retrieval'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multimodal retrieval.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Cui, Chenhao, and Zhoujun Li. "Prompt-Enhanced Generation for Multimodal Open Question Answering." Electronics 13, no. 8 (April 10, 2024): 1434. http://dx.doi.org/10.3390/electronics13081434.

Full text
Abstract:
Multimodal open question answering involves retrieving relevant information from both images and their corresponding texts given a question and then generating the answer. The quality of the generated answer heavily depends on the quality of the retrieved image–text pairs. Existing methods encode and retrieve images and texts, inputting the retrieved results into a language model to generate answers. These methods overlook the semantic alignment of image–text pairs within the information source, which affects the encoding and retrieval performance. Furthermore, these methods are highly dependent on retrieval performance, and poor retrieval quality can lead to poor generation performance. To address these issues, we propose a prompt-enhanced generation model, PEG, which includes generating supplementary descriptions for images to provide ample material for image–text alignment while also utilizing vision–language joint encoding to improve encoding effects and thereby enhance retrieval performance. Contrastive learning is used to enhance the model’s ability to discriminate between relevant and irrelevant information sources. Moreover, we further explore the knowledge within pre-trained model parameters through prefix-tuning to generate background knowledge relevant to the questions, offering additional input for answer generation and reducing the model’s dependency on retrieval performance. Experiments conducted on the WebQA and MultimodalQA datasets demonstrate that our model outperforms other baseline models in retrieval and generation performance.
APA, Harvard, Vancouver, ISO, and other styles
2

Xu, Hong. "Multimodal bird information retrieval system." Applied and Computational Engineering 53, no. 1 (March 28, 2024): 96–102. http://dx.doi.org/10.54254/2755-2721/53/20241282.

Full text
Abstract:
Multimodal bird information retrieval system can help people popularize bird knowledge and help bird conservation. In this paper, we use the self-built bird dataset, the ViT-B/32 model in CLIP model as the training model, python as the development language, and PyQT5 to complete the interface development. The system mainly realizes the uploading and displaying of bird pictures, the multimodal retrieval function of bird information, and the introduction of related bird information. The results of the trial run show that the system can accomplish the multimodal retrieval of bird information, retrieve the species of birds and other related information through the pictures uploaded by the user, or retrieve the most similar bird information through the text content described by the user.
APA, Harvard, Vancouver, ISO, and other styles
3

Romberg, Stefan, Rainer Lienhart, and Eva Hörster. "Multimodal Image Retrieval." International Journal of Multimedia Information Retrieval 1, no. 1 (March 7, 2012): 31–44. http://dx.doi.org/10.1007/s13735-012-0006-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kitanovski, Ivan, Gjorgji Strezoski, Ivica Dimitrovski, Gjorgji Madjarov, and Suzana Loskovska. "Multimodal medical image retrieval system." Multimedia Tools and Applications 76, no. 2 (January 25, 2016): 2955–78. http://dx.doi.org/10.1007/s11042-016-3261-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kulvinder Singh, Et al. "Enhancing Multimodal Information Retrieval Through Integrating Data Mining and Deep Learning Techniques." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 9 (October 30, 2023): 560–69. http://dx.doi.org/10.17762/ijritcc.v11i9.8844.

Full text
Abstract:
Multimodal information retrieval, the task of re trieving relevant information from heterogeneous data sources such as text, images, and videos, has gained significant attention in recent years due to the proliferation of multimedia content on the internet. This paper proposes an approach to enhance multimodal information retrieval by integrating data mining and deep learning techniques. Traditional information retrieval systems often struggle to effectively handle multimodal data due to the inherent complexity and diversity of such data sources. In this study, we leverage data mining techniques to preprocess and structure multimodal data efficiently. Data mining methods enable us to extract valuable patterns, relationships, and features from different modalities, providing a solid foundation for sub- sequent retrieval tasks. To further enhance the performance of multimodal information retrieval, deep learning techniques are employed. Deep neural networks have demonstrated their effectiveness in various multimedia tasks, including image recognition, natural language processing, and video analysis. By integrating deep learning models into our retrieval framework, we aim to capture complex intermodal dependencies and semantically rich representations, enabling more accurate and context-aware retrieval.
APA, Harvard, Vancouver, ISO, and other styles
6

Cao, Yu, Shawn Steffey, Jianbiao He, Degui Xiao, Cui Tao, Ping Chen, and Henning Müller. "Medical Image Retrieval: A Multimodal Approach." Cancer Informatics 13s3 (January 2014): CIN.S14053. http://dx.doi.org/10.4137/cin.s14053.

Full text
Abstract:
Medical imaging is becoming a vital component of war on cancer. Tremendous amounts of medical image data are captured and recorded in a digital format during cancer care and cancer research. Facing such an unprecedented volume of image data with heterogeneous image modalities, it is necessary to develop effective and efficient content-based medical image retrieval systems for cancer clinical practice and research. While substantial progress has been made in different areas of content-based image retrieval (CBIR) research, direct applications of existing CBIR techniques to the medical images produced unsatisfactory results, because of the unique characteristics of medical images. In this paper, we develop a new multimodal medical image retrieval approach based on the recent advances in the statistical graphic model and deep learning. Specifically, we first investigate a new extended probabilistic Latent Semantic Analysis model to integrate the visual and textual information from medical images to bridge the semantic gap. We then develop a new deep Boltzmann machine-based multimodal learning model to learn the joint density model from multimodal information in order to derive the missing modality. Experimental results with large volume of real-world medical images have shown that our new approach is a promising solution for the next-generation medical imaging indexing and retrieval system.
APA, Harvard, Vancouver, ISO, and other styles
7

Rafailidis, D., S. Manolopoulou, and P. Daras. "A unified framework for multimodal retrieval." Pattern Recognition 46, no. 12 (December 2013): 3358–70. http://dx.doi.org/10.1016/j.patcog.2013.05.023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Dong, Bin, Songlei Jian, and Kai Lu. "Learning Multimodal Representations by Symmetrically Transferring Local Structures." Symmetry 12, no. 9 (September 13, 2020): 1504. http://dx.doi.org/10.3390/sym12091504.

Full text
Abstract:
Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Guihao, and Jiangzhong Cao. "Feature Fusion Based on Transformer for Cross-modal Retrieval." Journal of Physics: Conference Series 2558, no. 1 (August 1, 2023): 012012. http://dx.doi.org/10.1088/1742-6596/2558/1/012012.

Full text
Abstract:
Abstract With the popularity of the Internet and the rapid growth of multimodal data, multimodal retrieval has gradually become a hot area of research. As one of the important branches of multimodal retrieval, image-text retrieval aims to design a model to learn and align two modal data, image and text, in order to build a bridge of semantic association between the two heterogeneous data, so as to achieve unified alignment and retrieval. The current mainstream image-text cross-modal retrieval approaches have made good progress by designing a deep learning-based model to find potential associations between different modal data. In this paper, we design a transformer-based feature fusion network to fuse the information of two modalities in the feature extraction process, which can enrich the semantic connection between the modalities. Meanwhile, we conduct experiments on the benchmark dataset Flickr30k and get competitive results, where recall at 10 achieves 96.2% accuracy in image-to-text retrieval.
APA, Harvard, Vancouver, ISO, and other styles
10

Kompus, Kristiina, Tom Eichele, Kenneth Hugdahl, and Lars Nyberg. "Multimodal Imaging of Incidental Retrieval: The Low Route to Memory." Journal of Cognitive Neuroscience 23, no. 4 (April 2011): 947–60. http://dx.doi.org/10.1162/jocn.2010.21494.

Full text
Abstract:
Memories of past episodes frequently come to mind incidentally, without directed search. It has remained unclear how incidental retrieval processes are initiated in the brain. Here we used fMRI and ERP recordings to find brain activity that specifically correlates with incidental retrieval, as compared to intentional retrieval. Intentional retrieval was associated with increased activation in dorsolateral prefrontal cortex. By contrast, incidental retrieval was associated with a reduced fMRI signal in posterior brain regions, including extrastriate and parahippocampal cortex, and a modulation of a posterior ERP component 170 msec after the onset of visual retrieval cues. Successful retrieval under both intentional and incidental conditions was associated with increased activation in the hippocampus, precuneus, and ventrolateral prefrontal cortex, as well as increased amplitude of the P600 ERP component. These results demonstrate how early bottom–up signals from posterior cortex can lead to reactivation of episodic memories in the absence of strategic retrieval attempts.
APA, Harvard, Vancouver, ISO, and other styles
11

UbaidullahBokhari, Mohammad, and Faraz Hasan. "Multimodal Information Retrieval: Challenges and Future Trends." International Journal of Computer Applications 74, no. 14 (July 26, 2013): 9–12. http://dx.doi.org/10.5120/12951-9967.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Yamaguchi, Masataka. "2. Multimodal Retrieval between Vision and Language." Journal of The Institute of Image Information and Television Engineers 72, no. 9 (2018): 655–58. http://dx.doi.org/10.3169/itej.72.655.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Calumby, Rodrigo Tripodi. "Diversity-oriented Multimodal and Interactive Information Retrieval." ACM SIGIR Forum 50, no. 1 (June 27, 2016): 86. http://dx.doi.org/10.1145/2964797.2964811.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Jin, Lu, Kai Li, Hao Hu, Guo-Jun Qi, and Jinhui Tang. "Semantic Neighbor Graph Hashing for Multimodal Retrieval." IEEE Transactions on Image Processing 27, no. 3 (March 2018): 1405–17. http://dx.doi.org/10.1109/tip.2017.2776745.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Peng, Yang, Xiaofeng Zhou, Daisy Zhe Wang, Ishan Patwa, Dihong Gong, and Chunsheng Victor Fang. "Multimodal Ensemble Fusion for Disambiguation and Retrieval." IEEE MultiMedia 23, no. 2 (April 2016): 42–52. http://dx.doi.org/10.1109/mmul.2016.26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Hu, Peng, Dezhong Peng, Xu Wang, and Yong Xiang. "Multimodal adversarial network for cross-modal retrieval." Knowledge-Based Systems 180 (September 2019): 38–50. http://dx.doi.org/10.1016/j.knosys.2019.05.017.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Waykar, Sanjay B., and C. R. Bharathi. "Multimodal Features and Probability Extended Nearest Neighbor Classification for Content-Based Lecture Video Retrieval." Journal of Intelligent Systems 26, no. 3 (July 26, 2017): 585–99. http://dx.doi.org/10.1515/jisys-2016-0041.

Full text
Abstract:
AbstractDue to the ever-increasing number of digital lecture libraries and lecture video portals, the challenge of retrieving lecture videos has become a very significant and demanding task in recent years. Accordingly, the literature presents different techniques for video retrieval by considering video contents as well as signal data. Here, we propose a lecture video retrieval system using multimodal features and probability extended nearest neighbor (PENN) classification. There are two modalities utilized for feature extraction. One is textual information, which is determined from the lecture video using optical character recognition. The second modality utilized to preserve video content is local vector pattern. These two modal features are extracted, and the retrieval of videos is performed using the proposed PENN classifier, which is the extension of the extended nearest neighbor classifier, by considering the different weightages for the first-level and second-level neighbors. The performance of the proposed video retrieval is evaluated using precision, recall, and F-measure, which are computed by matching the retrieved videos and the manually classified videos. From the experimentation, we proved that the average precision of the proposed PENN+VQ is 78.3%, which is higher than that of the existing methods.
APA, Harvard, Vancouver, ISO, and other styles
18

Rebstock, Alicia M., and Sarah E. Wallace. "Effects of a Combined Semantic Feature Analysis and Multimodal Treatment for Primary Progressive Aphasia: Pilot Study." Communication Disorders Quarterly 41, no. 2 (September 10, 2018): 71–85. http://dx.doi.org/10.1177/1525740118794399.

Full text
Abstract:
Primary progressive aphasia (PPA) is a neurodegenerative condition characterized by language and cognitive decline. Word-retrieval deficits are the most common PPA symptom and contribute to impaired spoken expression. Intense semantic interventions show promise for improving word retrieval in people with PPA. In addition, people with PPA may learn to use alternative communication modalities when they are unable to retrieve a word. However, executive function impairments can cause people to struggle to switch among modalities to repair communication breakdowns.This study examined the effects of a combined semantic feature analysis and multimodal communication program (SFA+MCP) on word-retrieval accuracy, switching among modalities, and overall communicative effectiveness in a person with PPA. An adult female with PPA completed SFA+MCP. Baseline, probe, intervention, and postintervention sessions were completed to measure word-retrieval accuracy and switching between communication modalities. A postintervention listener task was completed to measure communicative effectiveness. Changes in word-retrieval accuracy and switching were minimal. However, the listeners’ identification of the participant’s communication attempts was more accurate following treatment, suggesting increased overall communicative effectiveness. Further investigations of SFA+MCP, specifically relative to timing, intensity, and appropriate modifications for people with cognitive impairments associated with PPA are warranted.
APA, Harvard, Vancouver, ISO, and other styles
19

He, Chao, Dalin Wang, Zefu Tan, Liming Xu, and Nina Dai. "Cross-Modal Discrimination Hashing Retrieval Using Variable Length." Security and Communication Networks 2022 (September 9, 2022): 1–12. http://dx.doi.org/10.1155/2022/9638683.

Full text
Abstract:
Fast cross-modal retrieval technology based on hash coding has become a hot topic for the rich multimodal data (text, image, audio, etc.), especially security and privacy challenges in the Internet of Things and mobile edge computing. However, most methods based on hash coding are only mapped to the common hash coding space, and it relaxes the two value constraints of hash coding. Therefore, the learning of the multimodal hash coding may not be sufficient and effective to express the original multimodal data and cause the hash encoding category to be less discriminatory. For the sake of solving these problems, this paper proposes a method of mapping each modal data to the optimal length of hash coding space, respectively, and then the hash encoding of each modal data is solved by the discrete cross-modal hash algorithm of two value constraints. Finally, the similarity of multimodal data is compared in the potential space. The experimental results of the cross-model retrieval based on variable hash coding are better than that of the relative comparison methods in the WIKI data set, NUS-WIDE data set, as well as MIRFlickr data set, and the method we proposed is proved to be feasible and effective.
APA, Harvard, Vancouver, ISO, and other styles
20

Chávez, Ricardo Omar, Hugo Jair Escalante, Manuel Montes-y-Gómez, and Luis Enrique Sucar. "Multimodal Markov Random Field for Image Reranking Based on Relevance Feedback." ISRN Machine Vision 2013 (February 11, 2013): 1–16. http://dx.doi.org/10.1155/2013/428746.

Full text
Abstract:
This paper introduces a multimodal approach for reranking of image retrieval results based on relevance feedback. We consider the problem of reordering the ranked list of images returned by an image retrieval system, in such a way that relevant images to a query are moved to the first positions of the list. We propose a Markov random field (MRF) model that aims at classifying the images in the initial retrieval-result list as relevant or irrelevant; the output of the MRF is used to generate a new list of ranked images. The MRF takes into account (1) the rank information provided by the initial retrieval system, (2) similarities among images in the list, and (3) relevance feedback information. Hence, the problem of image reranking is reduced to that of minimizing an energy function that represents a trade-off between image relevance and interimage similarity. The proposed MRF is a multimodal as it can take advantage of both visual and textual information by which images are described with. We report experimental results in the IAPR TC12 collection using visual and textual features to represent images. Experimental results show that our method is able to improve the ranking provided by the base retrieval system. Also, the multimodal MRF outperforms unimodal (i.e., either text-based or image-based) MRFs that we have developed in previous work. Furthermore, the proposed MRF outperforms baseline multimodal methods that combine information from unimodal MRFs.
APA, Harvard, Vancouver, ISO, and other styles
21

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Full text
Abstract:
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.
APA, Harvard, Vancouver, ISO, and other styles
22

Schöpper, Lars-Michael, Tarini Singh, and Christian Frings. "The official soundtrack to “Five shades of grey”: Generalization in multimodal distractor-based retrieval." Attention, Perception, & Psychophysics 82, no. 7 (June 12, 2020): 3479–89. http://dx.doi.org/10.3758/s13414-020-02057-4.

Full text
Abstract:
Abstract When responding to two events in a sequence, the repetition or change of stimuli and the accompanying response can benefit or interfere with response execution: Full repetition leads to benefits in performance while partial repetition leads to costs. Additionally, even distractor stimuli can be integrated with a response, and can, upon repetition, lead to benefits or interference. Recently it has been suggested that not only identical, but also perceptually similar distractors retrieve a previous response (Singh et al., Attention, Perception, & Psychophysics, 78(8), 2307-2312, 2016): Participants discriminated four visual shapes appearing in five different shades of grey, the latter being irrelevant for task execution. Exact distractor repetitions yielded the strongest distractor-based retrieval effect, which decreased with increasing dissimilarity between shades of grey. In the current study, we expand these findings by conceptually replicating Singh et al. (2016) using multimodal stimuli. In Experiment 1 (N=31), participants discriminated four visual targets accompanied by five auditory distractors. In Experiment 2 (N=32), participants discriminated four auditory targets accompanied by five visual distractors. We replicated the generalization of distractor-based retrieval – that is, the distractor-based retrieval effect decreased with increasing distractor-dissimilarity. These results not only show that generalization in distractor-based retrieval occurs in multimodal feature processing, but also that these processes can occur for distractors perceived in a different modality to that of the target.
APA, Harvard, Vancouver, ISO, and other styles
23

Murrugarra-Llerena, Nils, and Adriana Kovashka. "Image retrieval with mixed initiative and multimodal feedback." Computer Vision and Image Understanding 207 (June 2021): 103204. http://dx.doi.org/10.1016/j.cviu.2021.103204.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Ismail, Nor Azman, and Ann O'Brien. "WEB-BASED PERSONAL DIGITAL PHOTO COLLECTIONS: MULTIMODAL RETRIEVAL." IIUM Engineering Journal 10, no. 1 (September 29, 2010): 49–57. http://dx.doi.org/10.31436/iiumej.v10i1.104.

Full text
Abstract:
When personal photo collections get large retrieval of specific photos or sets of photos becomes difficult mainly due to the fairly primitive means by which they are organised. Commercial photo handling systems help but often have only elementary searching features. In this paper, we describe an interactive web-based photo retrieval system that enables personal digital photo users to accomplish photo browsing by using multimodal interaction. This system not only enables users to use mouse click input modalities but also speech input modality to browse their personal digital photos in the World Wide Web (WWW) environment. The prototype system and it architecture utilise web technology which was built using web programming scripting (JavaScript, XHTML, ASP, XML based mark-up language) and image database in order to achieve its objective. All prototype programs and data files including the user’s photo repository, profiles, dialogues, grammars, prompt, and retrieval engine are stored and located in the web server. Our approach also consists of human-computer speech dialogue based on photo browsing of image content by four main categories (Who? What? When? and Where?). Our user study with 20 digital photo users showed that the participants reacted positively to their experience with the system interactions.
APA, Harvard, Vancouver, ISO, and other styles
25

ZHANG, Jing. "Video retrieval model based on multimodal information fusion." Journal of Computer Applications 28, no. 1 (July 10, 2008): 199–201. http://dx.doi.org/10.3724/sp.j.1087.2008.00199.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Cao, Wenming, Wenshuo Feng, Qiubin Lin, Guitao Cao, and Zhihai He. "A Review of Hashing Methods for Multimodal Retrieval." IEEE Access 8 (2020): 15377–91. http://dx.doi.org/10.1109/access.2020.2968154.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Zhang, Yu, Ye Yuan, Yishu Wang, and Guoren Wang. "A novel multimodal retrieval model based on ELM." Neurocomputing 277 (February 2018): 65–77. http://dx.doi.org/10.1016/j.neucom.2017.03.095.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Mourão, André, Flávio Martins, and João Magalhães. "Multimodal medical information retrieval with unsupervised rank fusion." Computerized Medical Imaging and Graphics 39 (January 2015): 35–45. http://dx.doi.org/10.1016/j.compmedimag.2014.05.006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Revuelta-Martínez, Alejandro, Luis Rodríguez, Ismael García-Varea, and Francisco Montero. "Multimodal interaction for information retrieval using natural language." Computer Standards & Interfaces 35, no. 5 (September 2013): 428–41. http://dx.doi.org/10.1016/j.csi.2012.11.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Liu, Anan, Wenhui Li, Weizhi Nie, and Yuting Su. "3D models retrieval algorithm based on multimodal data." Neurocomputing 259 (October 2017): 176–82. http://dx.doi.org/10.1016/j.neucom.2016.06.087.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Daras, Petros, and Apostolos Axenopoulos. "A 3D Shape Retrieval Framework Supporting Multimodal Queries." International Journal of Computer Vision 89, no. 2-3 (July 30, 2009): 229–47. http://dx.doi.org/10.1007/s11263-009-0277-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Chen, Xu, Alfred O. Hero, III, and Silvio Savarese. "Multimodal Video Indexing and Retrieval Using Directed Information." IEEE Transactions on Multimedia 14, no. 1 (February 2012): 3–16. http://dx.doi.org/10.1109/tmm.2011.2167223.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Pang, Lei, Shiai Zhu, and Chong-Wah Ngo. "Deep Multimodal Learning for Affective Analysis and Retrieval." IEEE Transactions on Multimedia 17, no. 11 (November 2015): 2008–20. http://dx.doi.org/10.1109/tmm.2015.2482228.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Sperandio, Ricardo C., Zenilton K. G. Patrocínio, Hugo B. de Paula, and Silvio J. F. Guimarães. "An efficient access method for multimodal video retrieval." Multimedia Tools and Applications 74, no. 4 (April 11, 2014): 1357–75. http://dx.doi.org/10.1007/s11042-014-1917-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Hubert, Gilles, and Josiane Mothe. "An adaptable search engine for multimodal information retrieval." Journal of the American Society for Information Science and Technology 60, no. 8 (August 2009): 1625–34. http://dx.doi.org/10.1002/asi.21091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

陈, 佳芸. "Multimodal Fashion Style Retrieval Based on Deep Learning." Computer Science and Application 13, no. 03 (2023): 492–501. http://dx.doi.org/10.12677/csa.2023.133048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

S. Gomathy, K. P. Deepa, T. Revathi, and L. Maria Michael Visuwasam. "Genre Specific Classification for Information Search and Multimodal Semantic Indexing for Data Retrieval." SIJ Transactions on Computer Science Engineering & its Applications (CSEA) 01, no. 01 (April 5, 2013): 10–15. http://dx.doi.org/10.9756/sijcsea/v1i1/01010159.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Qian, Shengsheng, Dizhan Xue, Huaiwen Zhang, Quan Fang, and Changsheng Xu. "Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 3 (May 18, 2021): 2440–48. http://dx.doi.org/10.1609/aaai.v35i3.16345.

Full text
Abstract:
Cross-modal retrieval has become an active study field with the expanding scale of multimodal data. To date, most existing methods transform multimodal data into a common representation space where semantic similarities between items can be directly measured across different modalities. However, these methods typically suffer from following limitations: 1) They usually attempt to bridge the modality gap by designing losses in the common representation space which may not be sufficient to eliminate potential heterogeneity of different modalities in the common space. 2) They typically treat labels as independent individuals and ignore label relationships which are important for constructing semantic links between multimodal data. In this work, we propose a novel Dual Adversarial Graph Neural Networks (DAGNN) composed of the dual generative adversarial networks and the multi-hop graph neural networks, which learn modality-invariant and discriminative common representations for cross-modal retrieval. Firstly, we construct the dual generative adversarial networks to project multimodal data into a common representation space. Secondly, we leverage the multi-hop graph neural networks, in which a layer aggregation mechanism is proposed to exploit multi-hop propagation information, to capture the label correlation dependency and learn inter-dependent classifiers. Comprehensive experiments conducted on two cross-modal retrieval benchmark datasets, NUS-WIDE and MIRFlickr, indicate the superiority of DAGNN.
APA, Harvard, Vancouver, ISO, and other styles
39

Ota, Kosuke, Keiichiro Shirai, Hidetoshi Miyao, and Minoru Maruyama. "Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings." Journal of Advanced Computational Intelligence and Intelligent Informatics 26, no. 6 (November 20, 2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.

Full text
Abstract:
In this work, we study the application of multimodal analogical reasoning to image retrieval. Multimodal analogy questions are given in a form of tuples of words and images, e.g., “cat”:“dog”::[an image of a cat sitting on a bench]:?, to search for an image of a dog sitting on a bench. Retrieving desired images given these tuples can be seen as a task of finding images whose relation between the query image is close to that of query words. One way to achieve the task is building a common vector space that exhibits analogical regularities. To learn such an embedding, we propose a quadruple neural network called multimodal siamese network. The network consists of recurrent neural networks and convolutional neural networks based on the siamese architecture. We also introduce an effective procedure to generate analogy examples from an image-caption dataset for training of our network. In our experiments, we test our model on analogy-based image retrieval tasks. The results show that our method outperforms the previous work in qualitative evaluation.
APA, Harvard, Vancouver, ISO, and other styles
40

Li, Ruxuan, Jingyi Wang, and Xuedong Tian. "A Multi-Modal Retrieval Model for Mathematical Expressions Based on ConvNeXt and Hesitant Fuzzy Set." Electronics 12, no. 20 (October 20, 2023): 4363. http://dx.doi.org/10.3390/electronics12204363.

Full text
Abstract:
Mathematical expression retrieval is an essential component of mathematical information retrieval. Current mathematical expression retrieval research primarily targets single modalities, particularly text, which can lead to the loss of structural information. On the other hand, multimodal research has demonstrated promising outcomes across different domains, and mathematical expressions in image format are adept at preserving their structural characteristics. So we propose a multi-modal retrieval model for mathematical expressions based on ConvNeXt and HFS to address the limitations of single-modal retrieval. For the image modal, mathematical expression retrieval is based on the similarity of image features and symbol-level features of the expression, where image features of the expression image are extracted by ConvNeXt, while symbol-level features are obtained by the Symbol Level Features Extraction (SLFE) module. For the text modal, the Formula Description Structure (FDS) is employed to analyze expressions and extract their attributes. Additionally, the application of the Hesitant Fuzzy Set (HFS) theory facilitates the computation of hesitant fuzzy similarity between mathematical queries and candidate expressions. Finally, Reciprocal Rank Fusion (RRF) is employed to integrate rankings from image modal and text modal retrieval, yielding the ultimate retrieval list. The experiment was conducted on the publicly accessible ArXiv dataset (containing 592,345 mathematical expressions) and the NTCIR-mair-wikipedia-corpus (NTCIR) dataset.The MAP@10 values for the multimodal RRF fusion approach are recorded as 0.774. These substantiate the efficacy of the multi-modal mathematical expression retrieval approach based on ConvNeXt and HFS.
APA, Harvard, Vancouver, ISO, and other styles
41

李, 劼博. "Video Speech Retrieval Model Based on Multimodal Feature Memory." Computer Science and Application 12, no. 07 (2022): 1747–55. http://dx.doi.org/10.12677/csa.2022.127176.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

LIU, Zhi, Fangyuan ZHAO, and Mengmeng ZHANG. "An Efficient Multimodal Aggregation Network for Video-Text Retrieval." IEICE Transactions on Information and Systems E105.D, no. 10 (October 1, 2022): 1825–28. http://dx.doi.org/10.1587/transinf.2022edl8018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Bu, Shuhui, Shaoguang Cheng, Zhenbao Liu, and Junwei Han. "Multimodal Feature Fusion for 3D Shape Recognition and Retrieval." IEEE MultiMedia 21, no. 4 (October 2014): 38–46. http://dx.doi.org/10.1109/mmul.2014.52.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Tang, Jinhui, and Zechao Li. "Weakly Supervised Multimodal Hashing for Scalable Social Image Retrieval." IEEE Transactions on Circuits and Systems for Video Technology 28, no. 10 (October 2018): 2730–41. http://dx.doi.org/10.1109/tcsvt.2017.2715227.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Figueroa, Cristhian, Hugo Ordoñez, Juan-Carlos Corrales, Carlos Cobos, Leandro Krug Wives, and Enrique Herrera-Viedma. "Improving business process retrieval using categorization and multimodal search." Knowledge-Based Systems 110 (October 2016): 49–59. http://dx.doi.org/10.1016/j.knosys.2016.07.014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Datta, Deepanwita, Shubham Varma, Ravindranath Chowdary C., and Sanjay K. Singh. "Multimodal Retrieval using Mutual Information based Textual Query Reformulation." Expert Systems with Applications 68 (February 2017): 81–92. http://dx.doi.org/10.1016/j.eswa.2016.09.039.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Escalante, Hugo Jair, Manuel Montes, and Enrique Sucar. "Multimodal indexing based on semantic cohesion for image retrieval." Information Retrieval 15, no. 1 (June 5, 2011): 1–32. http://dx.doi.org/10.1007/s10791-011-9170-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Markonis, Dimitrios, Roger Schaer, and Henning Müller. "Evaluating multimodal relevance feedback techniques for medical image retrieval." Information Retrieval Journal 19, no. 1-2 (August 1, 2015): 100–112. http://dx.doi.org/10.1007/s10791-015-9260-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Imhof, Melanie, and Martin Braschler. "A study of untrained models for multimodal information retrieval." Information Retrieval Journal 21, no. 1 (November 3, 2017): 81–106. http://dx.doi.org/10.1007/s10791-017-9322-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Soni, Ankita, and Richa Chouhan. "Multimodal Information Retrieval by using Visual and Textual Query." International Journal of Computer Applications 137, no. 1 (March 17, 2016): 6–10. http://dx.doi.org/10.5120/ijca2016908637.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography