Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Multimodal retrieval.

Zeitschriftenartikel zum Thema „Multimodal retrieval“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Zeitschriftenartikel für die Forschung zum Thema "Multimodal retrieval" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Zeitschriftenartikel für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Cui, Chenhao, und Zhoujun Li. „Prompt-Enhanced Generation for Multimodal Open Question Answering“. Electronics 13, Nr. 8 (10.04.2024): 1434. http://dx.doi.org/10.3390/electronics13081434.

Der volle Inhalt der Quelle
Annotation:
Multimodal open question answering involves retrieving relevant information from both images and their corresponding texts given a question and then generating the answer. The quality of the generated answer heavily depends on the quality of the retrieved image–text pairs. Existing methods encode and retrieve images and texts, inputting the retrieved results into a language model to generate answers. These methods overlook the semantic alignment of image–text pairs within the information source, which affects the encoding and retrieval performance. Furthermore, these methods are highly dependent on retrieval performance, and poor retrieval quality can lead to poor generation performance. To address these issues, we propose a prompt-enhanced generation model, PEG, which includes generating supplementary descriptions for images to provide ample material for image–text alignment while also utilizing vision–language joint encoding to improve encoding effects and thereby enhance retrieval performance. Contrastive learning is used to enhance the model’s ability to discriminate between relevant and irrelevant information sources. Moreover, we further explore the knowledge within pre-trained model parameters through prefix-tuning to generate background knowledge relevant to the questions, offering additional input for answer generation and reducing the model’s dependency on retrieval performance. Experiments conducted on the WebQA and MultimodalQA datasets demonstrate that our model outperforms other baseline models in retrieval and generation performance.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Xu, Hong. „Multimodal bird information retrieval system“. Applied and Computational Engineering 53, Nr. 1 (28.03.2024): 96–102. http://dx.doi.org/10.54254/2755-2721/53/20241282.

Der volle Inhalt der Quelle
Annotation:
Multimodal bird information retrieval system can help people popularize bird knowledge and help bird conservation. In this paper, we use the self-built bird dataset, the ViT-B/32 model in CLIP model as the training model, python as the development language, and PyQT5 to complete the interface development. The system mainly realizes the uploading and displaying of bird pictures, the multimodal retrieval function of bird information, and the introduction of related bird information. The results of the trial run show that the system can accomplish the multimodal retrieval of bird information, retrieve the species of birds and other related information through the pictures uploaded by the user, or retrieve the most similar bird information through the text content described by the user.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Romberg, Stefan, Rainer Lienhart und Eva Hörster. „Multimodal Image Retrieval“. International Journal of Multimedia Information Retrieval 1, Nr. 1 (07.03.2012): 31–44. http://dx.doi.org/10.1007/s13735-012-0006-4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Kitanovski, Ivan, Gjorgji Strezoski, Ivica Dimitrovski, Gjorgji Madjarov und Suzana Loskovska. „Multimodal medical image retrieval system“. Multimedia Tools and Applications 76, Nr. 2 (25.01.2016): 2955–78. http://dx.doi.org/10.1007/s11042-016-3261-1.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Kulvinder Singh, Et al. „Enhancing Multimodal Information Retrieval Through Integrating Data Mining and Deep Learning Techniques“. International Journal on Recent and Innovation Trends in Computing and Communication 11, Nr. 9 (30.10.2023): 560–69. http://dx.doi.org/10.17762/ijritcc.v11i9.8844.

Der volle Inhalt der Quelle
Annotation:
Multimodal information retrieval, the task of re trieving relevant information from heterogeneous data sources such as text, images, and videos, has gained significant attention in recent years due to the proliferation of multimedia content on the internet. This paper proposes an approach to enhance multimodal information retrieval by integrating data mining and deep learning techniques. Traditional information retrieval systems often struggle to effectively handle multimodal data due to the inherent complexity and diversity of such data sources. In this study, we leverage data mining techniques to preprocess and structure multimodal data efficiently. Data mining methods enable us to extract valuable patterns, relationships, and features from different modalities, providing a solid foundation for sub- sequent retrieval tasks. To further enhance the performance of multimodal information retrieval, deep learning techniques are employed. Deep neural networks have demonstrated their effectiveness in various multimedia tasks, including image recognition, natural language processing, and video analysis. By integrating deep learning models into our retrieval framework, we aim to capture complex intermodal dependencies and semantically rich representations, enabling more accurate and context-aware retrieval.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Cao, Yu, Shawn Steffey, Jianbiao He, Degui Xiao, Cui Tao, Ping Chen und Henning Müller. „Medical Image Retrieval: A Multimodal Approach“. Cancer Informatics 13s3 (Januar 2014): CIN.S14053. http://dx.doi.org/10.4137/cin.s14053.

Der volle Inhalt der Quelle
Annotation:
Medical imaging is becoming a vital component of war on cancer. Tremendous amounts of medical image data are captured and recorded in a digital format during cancer care and cancer research. Facing such an unprecedented volume of image data with heterogeneous image modalities, it is necessary to develop effective and efficient content-based medical image retrieval systems for cancer clinical practice and research. While substantial progress has been made in different areas of content-based image retrieval (CBIR) research, direct applications of existing CBIR techniques to the medical images produced unsatisfactory results, because of the unique characteristics of medical images. In this paper, we develop a new multimodal medical image retrieval approach based on the recent advances in the statistical graphic model and deep learning. Specifically, we first investigate a new extended probabilistic Latent Semantic Analysis model to integrate the visual and textual information from medical images to bridge the semantic gap. We then develop a new deep Boltzmann machine-based multimodal learning model to learn the joint density model from multimodal information in order to derive the missing modality. Experimental results with large volume of real-world medical images have shown that our new approach is a promising solution for the next-generation medical imaging indexing and retrieval system.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Rafailidis, D., S. Manolopoulou und P. Daras. „A unified framework for multimodal retrieval“. Pattern Recognition 46, Nr. 12 (Dezember 2013): 3358–70. http://dx.doi.org/10.1016/j.patcog.2013.05.023.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Dong, Bin, Songlei Jian und Kai Lu. „Learning Multimodal Representations by Symmetrically Transferring Local Structures“. Symmetry 12, Nr. 9 (13.09.2020): 1504. http://dx.doi.org/10.3390/sym12091504.

Der volle Inhalt der Quelle
Annotation:
Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Zhang, Guihao, und Jiangzhong Cao. „Feature Fusion Based on Transformer for Cross-modal Retrieval“. Journal of Physics: Conference Series 2558, Nr. 1 (01.08.2023): 012012. http://dx.doi.org/10.1088/1742-6596/2558/1/012012.

Der volle Inhalt der Quelle
Annotation:
Abstract With the popularity of the Internet and the rapid growth of multimodal data, multimodal retrieval has gradually become a hot area of research. As one of the important branches of multimodal retrieval, image-text retrieval aims to design a model to learn and align two modal data, image and text, in order to build a bridge of semantic association between the two heterogeneous data, so as to achieve unified alignment and retrieval. The current mainstream image-text cross-modal retrieval approaches have made good progress by designing a deep learning-based model to find potential associations between different modal data. In this paper, we design a transformer-based feature fusion network to fuse the information of two modalities in the feature extraction process, which can enrich the semantic connection between the modalities. Meanwhile, we conduct experiments on the benchmark dataset Flickr30k and get competitive results, where recall at 10 achieves 96.2% accuracy in image-to-text retrieval.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Kompus, Kristiina, Tom Eichele, Kenneth Hugdahl und Lars Nyberg. „Multimodal Imaging of Incidental Retrieval: The Low Route to Memory“. Journal of Cognitive Neuroscience 23, Nr. 4 (April 2011): 947–60. http://dx.doi.org/10.1162/jocn.2010.21494.

Der volle Inhalt der Quelle
Annotation:
Memories of past episodes frequently come to mind incidentally, without directed search. It has remained unclear how incidental retrieval processes are initiated in the brain. Here we used fMRI and ERP recordings to find brain activity that specifically correlates with incidental retrieval, as compared to intentional retrieval. Intentional retrieval was associated with increased activation in dorsolateral prefrontal cortex. By contrast, incidental retrieval was associated with a reduced fMRI signal in posterior brain regions, including extrastriate and parahippocampal cortex, and a modulation of a posterior ERP component 170 msec after the onset of visual retrieval cues. Successful retrieval under both intentional and incidental conditions was associated with increased activation in the hippocampus, precuneus, and ventrolateral prefrontal cortex, as well as increased amplitude of the P600 ERP component. These results demonstrate how early bottom–up signals from posterior cortex can lead to reactivation of episodic memories in the absence of strategic retrieval attempts.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
11

UbaidullahBokhari, Mohammad, und Faraz Hasan. „Multimodal Information Retrieval: Challenges and Future Trends“. International Journal of Computer Applications 74, Nr. 14 (26.07.2013): 9–12. http://dx.doi.org/10.5120/12951-9967.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
12

Yamaguchi, Masataka. „2. Multimodal Retrieval between Vision and Language“. Journal of The Institute of Image Information and Television Engineers 72, Nr. 9 (2018): 655–58. http://dx.doi.org/10.3169/itej.72.655.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
13

Calumby, Rodrigo Tripodi. „Diversity-oriented Multimodal and Interactive Information Retrieval“. ACM SIGIR Forum 50, Nr. 1 (27.06.2016): 86. http://dx.doi.org/10.1145/2964797.2964811.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
14

Jin, Lu, Kai Li, Hao Hu, Guo-Jun Qi und Jinhui Tang. „Semantic Neighbor Graph Hashing for Multimodal Retrieval“. IEEE Transactions on Image Processing 27, Nr. 3 (März 2018): 1405–17. http://dx.doi.org/10.1109/tip.2017.2776745.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
15

Peng, Yang, Xiaofeng Zhou, Daisy Zhe Wang, Ishan Patwa, Dihong Gong und Chunsheng Victor Fang. „Multimodal Ensemble Fusion for Disambiguation and Retrieval“. IEEE MultiMedia 23, Nr. 2 (April 2016): 42–52. http://dx.doi.org/10.1109/mmul.2016.26.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
16

Hu, Peng, Dezhong Peng, Xu Wang und Yong Xiang. „Multimodal adversarial network for cross-modal retrieval“. Knowledge-Based Systems 180 (September 2019): 38–50. http://dx.doi.org/10.1016/j.knosys.2019.05.017.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
17

Waykar, Sanjay B., und C. R. Bharathi. „Multimodal Features and Probability Extended Nearest Neighbor Classification for Content-Based Lecture Video Retrieval“. Journal of Intelligent Systems 26, Nr. 3 (26.07.2017): 585–99. http://dx.doi.org/10.1515/jisys-2016-0041.

Der volle Inhalt der Quelle
Annotation:
AbstractDue to the ever-increasing number of digital lecture libraries and lecture video portals, the challenge of retrieving lecture videos has become a very significant and demanding task in recent years. Accordingly, the literature presents different techniques for video retrieval by considering video contents as well as signal data. Here, we propose a lecture video retrieval system using multimodal features and probability extended nearest neighbor (PENN) classification. There are two modalities utilized for feature extraction. One is textual information, which is determined from the lecture video using optical character recognition. The second modality utilized to preserve video content is local vector pattern. These two modal features are extracted, and the retrieval of videos is performed using the proposed PENN classifier, which is the extension of the extended nearest neighbor classifier, by considering the different weightages for the first-level and second-level neighbors. The performance of the proposed video retrieval is evaluated using precision, recall, and F-measure, which are computed by matching the retrieved videos and the manually classified videos. From the experimentation, we proved that the average precision of the proposed PENN+VQ is 78.3%, which is higher than that of the existing methods.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
18

Rebstock, Alicia M., und Sarah E. Wallace. „Effects of a Combined Semantic Feature Analysis and Multimodal Treatment for Primary Progressive Aphasia: Pilot Study“. Communication Disorders Quarterly 41, Nr. 2 (10.09.2018): 71–85. http://dx.doi.org/10.1177/1525740118794399.

Der volle Inhalt der Quelle
Annotation:
Primary progressive aphasia (PPA) is a neurodegenerative condition characterized by language and cognitive decline. Word-retrieval deficits are the most common PPA symptom and contribute to impaired spoken expression. Intense semantic interventions show promise for improving word retrieval in people with PPA. In addition, people with PPA may learn to use alternative communication modalities when they are unable to retrieve a word. However, executive function impairments can cause people to struggle to switch among modalities to repair communication breakdowns.This study examined the effects of a combined semantic feature analysis and multimodal communication program (SFA+MCP) on word-retrieval accuracy, switching among modalities, and overall communicative effectiveness in a person with PPA. An adult female with PPA completed SFA+MCP. Baseline, probe, intervention, and postintervention sessions were completed to measure word-retrieval accuracy and switching between communication modalities. A postintervention listener task was completed to measure communicative effectiveness. Changes in word-retrieval accuracy and switching were minimal. However, the listeners’ identification of the participant’s communication attempts was more accurate following treatment, suggesting increased overall communicative effectiveness. Further investigations of SFA+MCP, specifically relative to timing, intensity, and appropriate modifications for people with cognitive impairments associated with PPA are warranted.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
19

He, Chao, Dalin Wang, Zefu Tan, Liming Xu und Nina Dai. „Cross-Modal Discrimination Hashing Retrieval Using Variable Length“. Security and Communication Networks 2022 (09.09.2022): 1–12. http://dx.doi.org/10.1155/2022/9638683.

Der volle Inhalt der Quelle
Annotation:
Fast cross-modal retrieval technology based on hash coding has become a hot topic for the rich multimodal data (text, image, audio, etc.), especially security and privacy challenges in the Internet of Things and mobile edge computing. However, most methods based on hash coding are only mapped to the common hash coding space, and it relaxes the two value constraints of hash coding. Therefore, the learning of the multimodal hash coding may not be sufficient and effective to express the original multimodal data and cause the hash encoding category to be less discriminatory. For the sake of solving these problems, this paper proposes a method of mapping each modal data to the optimal length of hash coding space, respectively, and then the hash encoding of each modal data is solved by the discrete cross-modal hash algorithm of two value constraints. Finally, the similarity of multimodal data is compared in the potential space. The experimental results of the cross-model retrieval based on variable hash coding are better than that of the relative comparison methods in the WIKI data set, NUS-WIDE data set, as well as MIRFlickr data set, and the method we proposed is proved to be feasible and effective.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
20

Chávez, Ricardo Omar, Hugo Jair Escalante, Manuel Montes-y-Gómez und Luis Enrique Sucar. „Multimodal Markov Random Field for Image Reranking Based on Relevance Feedback“. ISRN Machine Vision 2013 (11.02.2013): 1–16. http://dx.doi.org/10.1155/2013/428746.

Der volle Inhalt der Quelle
Annotation:
This paper introduces a multimodal approach for reranking of image retrieval results based on relevance feedback. We consider the problem of reordering the ranked list of images returned by an image retrieval system, in such a way that relevant images to a query are moved to the first positions of the list. We propose a Markov random field (MRF) model that aims at classifying the images in the initial retrieval-result list as relevant or irrelevant; the output of the MRF is used to generate a new list of ranked images. The MRF takes into account (1) the rank information provided by the initial retrieval system, (2) similarities among images in the list, and (3) relevance feedback information. Hence, the problem of image reranking is reduced to that of minimizing an energy function that represents a trade-off between image relevance and interimage similarity. The proposed MRF is a multimodal as it can take advantage of both visual and textual information by which images are described with. We report experimental results in the IAPR TC12 collection using visual and textual features to represent images. Experimental results show that our method is able to improve the ranking provided by the base retrieval system. Also, the multimodal MRF outperforms unimodal (i.e., either text-based or image-based) MRFs that we have developed in previous work. Furthermore, the proposed MRF outperforms baseline multimodal methods that combine information from unimodal MRFs.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
21

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang und Heng Tao Shen. „Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval“. Proceedings of the AAAI Conference on Artificial Intelligence 34, Nr. 07 (03.04.2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Der volle Inhalt der Quelle
Annotation:
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
22

Schöpper, Lars-Michael, Tarini Singh und Christian Frings. „The official soundtrack to “Five shades of grey”: Generalization in multimodal distractor-based retrieval“. Attention, Perception, & Psychophysics 82, Nr. 7 (12.06.2020): 3479–89. http://dx.doi.org/10.3758/s13414-020-02057-4.

Der volle Inhalt der Quelle
Annotation:
Abstract When responding to two events in a sequence, the repetition or change of stimuli and the accompanying response can benefit or interfere with response execution: Full repetition leads to benefits in performance while partial repetition leads to costs. Additionally, even distractor stimuli can be integrated with a response, and can, upon repetition, lead to benefits or interference. Recently it has been suggested that not only identical, but also perceptually similar distractors retrieve a previous response (Singh et al., Attention, Perception, & Psychophysics, 78(8), 2307-2312, 2016): Participants discriminated four visual shapes appearing in five different shades of grey, the latter being irrelevant for task execution. Exact distractor repetitions yielded the strongest distractor-based retrieval effect, which decreased with increasing dissimilarity between shades of grey. In the current study, we expand these findings by conceptually replicating Singh et al. (2016) using multimodal stimuli. In Experiment 1 (N=31), participants discriminated four visual targets accompanied by five auditory distractors. In Experiment 2 (N=32), participants discriminated four auditory targets accompanied by five visual distractors. We replicated the generalization of distractor-based retrieval – that is, the distractor-based retrieval effect decreased with increasing distractor-dissimilarity. These results not only show that generalization in distractor-based retrieval occurs in multimodal feature processing, but also that these processes can occur for distractors perceived in a different modality to that of the target.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
23

Murrugarra-Llerena, Nils, und Adriana Kovashka. „Image retrieval with mixed initiative and multimodal feedback“. Computer Vision and Image Understanding 207 (Juni 2021): 103204. http://dx.doi.org/10.1016/j.cviu.2021.103204.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
24

Ismail, Nor Azman, und Ann O'Brien. „WEB-BASED PERSONAL DIGITAL PHOTO COLLECTIONS: MULTIMODAL RETRIEVAL“. IIUM Engineering Journal 10, Nr. 1 (29.09.2010): 49–57. http://dx.doi.org/10.31436/iiumej.v10i1.104.

Der volle Inhalt der Quelle
Annotation:
When personal photo collections get large retrieval of specific photos or sets of photos becomes difficult mainly due to the fairly primitive means by which they are organised. Commercial photo handling systems help but often have only elementary searching features. In this paper, we describe an interactive web-based photo retrieval system that enables personal digital photo users to accomplish photo browsing by using multimodal interaction. This system not only enables users to use mouse click input modalities but also speech input modality to browse their personal digital photos in the World Wide Web (WWW) environment. The prototype system and it architecture utilise web technology which was built using web programming scripting (JavaScript, XHTML, ASP, XML based mark-up language) and image database in order to achieve its objective. All prototype programs and data files including the user’s photo repository, profiles, dialogues, grammars, prompt, and retrieval engine are stored and located in the web server. Our approach also consists of human-computer speech dialogue based on photo browsing of image content by four main categories (Who? What? When? and Where?). Our user study with 20 digital photo users showed that the participants reacted positively to their experience with the system interactions.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
25

ZHANG, Jing. „Video retrieval model based on multimodal information fusion“. Journal of Computer Applications 28, Nr. 1 (10.07.2008): 199–201. http://dx.doi.org/10.3724/sp.j.1087.2008.00199.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
26

Cao, Wenming, Wenshuo Feng, Qiubin Lin, Guitao Cao und Zhihai He. „A Review of Hashing Methods for Multimodal Retrieval“. IEEE Access 8 (2020): 15377–91. http://dx.doi.org/10.1109/access.2020.2968154.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
27

Zhang, Yu, Ye Yuan, Yishu Wang und Guoren Wang. „A novel multimodal retrieval model based on ELM“. Neurocomputing 277 (Februar 2018): 65–77. http://dx.doi.org/10.1016/j.neucom.2017.03.095.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
28

Mourão, André, Flávio Martins und João Magalhães. „Multimodal medical information retrieval with unsupervised rank fusion“. Computerized Medical Imaging and Graphics 39 (Januar 2015): 35–45. http://dx.doi.org/10.1016/j.compmedimag.2014.05.006.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
29

Revuelta-Martínez, Alejandro, Luis Rodríguez, Ismael García-Varea und Francisco Montero. „Multimodal interaction for information retrieval using natural language“. Computer Standards & Interfaces 35, Nr. 5 (September 2013): 428–41. http://dx.doi.org/10.1016/j.csi.2012.11.002.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
30

Liu, Anan, Wenhui Li, Weizhi Nie und Yuting Su. „3D models retrieval algorithm based on multimodal data“. Neurocomputing 259 (Oktober 2017): 176–82. http://dx.doi.org/10.1016/j.neucom.2016.06.087.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
31

Daras, Petros, und Apostolos Axenopoulos. „A 3D Shape Retrieval Framework Supporting Multimodal Queries“. International Journal of Computer Vision 89, Nr. 2-3 (30.07.2009): 229–47. http://dx.doi.org/10.1007/s11263-009-0277-2.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
32

Chen, Xu, Alfred O. Hero, III und Silvio Savarese. „Multimodal Video Indexing and Retrieval Using Directed Information“. IEEE Transactions on Multimedia 14, Nr. 1 (Februar 2012): 3–16. http://dx.doi.org/10.1109/tmm.2011.2167223.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
33

Pang, Lei, Shiai Zhu und Chong-Wah Ngo. „Deep Multimodal Learning for Affective Analysis and Retrieval“. IEEE Transactions on Multimedia 17, Nr. 11 (November 2015): 2008–20. http://dx.doi.org/10.1109/tmm.2015.2482228.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
34

Sperandio, Ricardo C., Zenilton K. G. Patrocínio, Hugo B. de Paula und Silvio J. F. Guimarães. „An efficient access method for multimodal video retrieval“. Multimedia Tools and Applications 74, Nr. 4 (11.04.2014): 1357–75. http://dx.doi.org/10.1007/s11042-014-1917-2.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
35

Hubert, Gilles, und Josiane Mothe. „An adaptable search engine for multimodal information retrieval“. Journal of the American Society for Information Science and Technology 60, Nr. 8 (August 2009): 1625–34. http://dx.doi.org/10.1002/asi.21091.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
36

陈, 佳芸. „Multimodal Fashion Style Retrieval Based on Deep Learning“. Computer Science and Application 13, Nr. 03 (2023): 492–501. http://dx.doi.org/10.12677/csa.2023.133048.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
37

S. Gomathy, K. P. Deepa, T. Revathi und L. Maria Michael Visuwasam. „Genre Specific Classification for Information Search and Multimodal Semantic Indexing for Data Retrieval“. SIJ Transactions on Computer Science Engineering & its Applications (CSEA) 01, Nr. 01 (05.04.2013): 10–15. http://dx.doi.org/10.9756/sijcsea/v1i1/01010159.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
38

Qian, Shengsheng, Dizhan Xue, Huaiwen Zhang, Quan Fang und Changsheng Xu. „Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 3 (18.05.2021): 2440–48. http://dx.doi.org/10.1609/aaai.v35i3.16345.

Der volle Inhalt der Quelle
Annotation:
Cross-modal retrieval has become an active study field with the expanding scale of multimodal data. To date, most existing methods transform multimodal data into a common representation space where semantic similarities between items can be directly measured across different modalities. However, these methods typically suffer from following limitations: 1) They usually attempt to bridge the modality gap by designing losses in the common representation space which may not be sufficient to eliminate potential heterogeneity of different modalities in the common space. 2) They typically treat labels as independent individuals and ignore label relationships which are important for constructing semantic links between multimodal data. In this work, we propose a novel Dual Adversarial Graph Neural Networks (DAGNN) composed of the dual generative adversarial networks and the multi-hop graph neural networks, which learn modality-invariant and discriminative common representations for cross-modal retrieval. Firstly, we construct the dual generative adversarial networks to project multimodal data into a common representation space. Secondly, we leverage the multi-hop graph neural networks, in which a layer aggregation mechanism is proposed to exploit multi-hop propagation information, to capture the label correlation dependency and learn inter-dependent classifiers. Comprehensive experiments conducted on two cross-modal retrieval benchmark datasets, NUS-WIDE and MIRFlickr, indicate the superiority of DAGNN.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
39

Ota, Kosuke, Keiichiro Shirai, Hidetoshi Miyao und Minoru Maruyama. „Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings“. Journal of Advanced Computational Intelligence and Intelligent Informatics 26, Nr. 6 (20.11.2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.

Der volle Inhalt der Quelle
Annotation:
In this work, we study the application of multimodal analogical reasoning to image retrieval. Multimodal analogy questions are given in a form of tuples of words and images, e.g., “cat”:“dog”::[an image of a cat sitting on a bench]:?, to search for an image of a dog sitting on a bench. Retrieving desired images given these tuples can be seen as a task of finding images whose relation between the query image is close to that of query words. One way to achieve the task is building a common vector space that exhibits analogical regularities. To learn such an embedding, we propose a quadruple neural network called multimodal siamese network. The network consists of recurrent neural networks and convolutional neural networks based on the siamese architecture. We also introduce an effective procedure to generate analogy examples from an image-caption dataset for training of our network. In our experiments, we test our model on analogy-based image retrieval tasks. The results show that our method outperforms the previous work in qualitative evaluation.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
40

Li, Ruxuan, Jingyi Wang und Xuedong Tian. „A Multi-Modal Retrieval Model for Mathematical Expressions Based on ConvNeXt and Hesitant Fuzzy Set“. Electronics 12, Nr. 20 (20.10.2023): 4363. http://dx.doi.org/10.3390/electronics12204363.

Der volle Inhalt der Quelle
Annotation:
Mathematical expression retrieval is an essential component of mathematical information retrieval. Current mathematical expression retrieval research primarily targets single modalities, particularly text, which can lead to the loss of structural information. On the other hand, multimodal research has demonstrated promising outcomes across different domains, and mathematical expressions in image format are adept at preserving their structural characteristics. So we propose a multi-modal retrieval model for mathematical expressions based on ConvNeXt and HFS to address the limitations of single-modal retrieval. For the image modal, mathematical expression retrieval is based on the similarity of image features and symbol-level features of the expression, where image features of the expression image are extracted by ConvNeXt, while symbol-level features are obtained by the Symbol Level Features Extraction (SLFE) module. For the text modal, the Formula Description Structure (FDS) is employed to analyze expressions and extract their attributes. Additionally, the application of the Hesitant Fuzzy Set (HFS) theory facilitates the computation of hesitant fuzzy similarity between mathematical queries and candidate expressions. Finally, Reciprocal Rank Fusion (RRF) is employed to integrate rankings from image modal and text modal retrieval, yielding the ultimate retrieval list. The experiment was conducted on the publicly accessible ArXiv dataset (containing 592,345 mathematical expressions) and the NTCIR-mair-wikipedia-corpus (NTCIR) dataset.The MAP@10 values for the multimodal RRF fusion approach are recorded as 0.774. These substantiate the efficacy of the multi-modal mathematical expression retrieval approach based on ConvNeXt and HFS.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
41

李, 劼博. „Video Speech Retrieval Model Based on Multimodal Feature Memory“. Computer Science and Application 12, Nr. 07 (2022): 1747–55. http://dx.doi.org/10.12677/csa.2022.127176.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
42

LIU, Zhi, Fangyuan ZHAO und Mengmeng ZHANG. „An Efficient Multimodal Aggregation Network for Video-Text Retrieval“. IEICE Transactions on Information and Systems E105.D, Nr. 10 (01.10.2022): 1825–28. http://dx.doi.org/10.1587/transinf.2022edl8018.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
43

Bu, Shuhui, Shaoguang Cheng, Zhenbao Liu und Junwei Han. „Multimodal Feature Fusion for 3D Shape Recognition and Retrieval“. IEEE MultiMedia 21, Nr. 4 (Oktober 2014): 38–46. http://dx.doi.org/10.1109/mmul.2014.52.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
44

Tang, Jinhui, und Zechao Li. „Weakly Supervised Multimodal Hashing for Scalable Social Image Retrieval“. IEEE Transactions on Circuits and Systems for Video Technology 28, Nr. 10 (Oktober 2018): 2730–41. http://dx.doi.org/10.1109/tcsvt.2017.2715227.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
45

Figueroa, Cristhian, Hugo Ordoñez, Juan-Carlos Corrales, Carlos Cobos, Leandro Krug Wives und Enrique Herrera-Viedma. „Improving business process retrieval using categorization and multimodal search“. Knowledge-Based Systems 110 (Oktober 2016): 49–59. http://dx.doi.org/10.1016/j.knosys.2016.07.014.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
46

Datta, Deepanwita, Shubham Varma, Ravindranath Chowdary C. und Sanjay K. Singh. „Multimodal Retrieval using Mutual Information based Textual Query Reformulation“. Expert Systems with Applications 68 (Februar 2017): 81–92. http://dx.doi.org/10.1016/j.eswa.2016.09.039.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
47

Escalante, Hugo Jair, Manuel Montes und Enrique Sucar. „Multimodal indexing based on semantic cohesion for image retrieval“. Information Retrieval 15, Nr. 1 (05.06.2011): 1–32. http://dx.doi.org/10.1007/s10791-011-9170-z.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
48

Markonis, Dimitrios, Roger Schaer und Henning Müller. „Evaluating multimodal relevance feedback techniques for medical image retrieval“. Information Retrieval Journal 19, Nr. 1-2 (01.08.2015): 100–112. http://dx.doi.org/10.1007/s10791-015-9260-4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
49

Imhof, Melanie, und Martin Braschler. „A study of untrained models for multimodal information retrieval“. Information Retrieval Journal 21, Nr. 1 (03.11.2017): 81–106. http://dx.doi.org/10.1007/s10791-017-9322-x.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
50

Soni, Ankita, und Richa Chouhan. „Multimodal Information Retrieval by using Visual and Textual Query“. International Journal of Computer Applications 137, Nr. 1 (17.03.2016): 6–10. http://dx.doi.org/10.5120/ijca2016908637.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie