Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Multimodal Information Retrieval.

Zeitschriftenartikel zum Thema „Multimodal Information Retrieval“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Zeitschriftenartikel für die Forschung zum Thema "Multimodal Information Retrieval" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Zeitschriftenartikel für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Xu, Hong. „Multimodal bird information retrieval system“. Applied and Computational Engineering 53, Nr. 1 (28.03.2024): 96–102. http://dx.doi.org/10.54254/2755-2721/53/20241282.

Der volle Inhalt der Quelle
Annotation:
Multimodal bird information retrieval system can help people popularize bird knowledge and help bird conservation. In this paper, we use the self-built bird dataset, the ViT-B/32 model in CLIP model as the training model, python as the development language, and PyQT5 to complete the interface development. The system mainly realizes the uploading and displaying of bird pictures, the multimodal retrieval function of bird information, and the introduction of related bird information. The results of the trial run show that the system can accomplish the multimodal retrieval of bird information, retrieve the species of birds and other related information through the pictures uploaded by the user, or retrieve the most similar bird information through the text content described by the user.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Cui, Chenhao, und Zhoujun Li. „Prompt-Enhanced Generation for Multimodal Open Question Answering“. Electronics 13, Nr. 8 (10.04.2024): 1434. http://dx.doi.org/10.3390/electronics13081434.

Der volle Inhalt der Quelle
Annotation:
Multimodal open question answering involves retrieving relevant information from both images and their corresponding texts given a question and then generating the answer. The quality of the generated answer heavily depends on the quality of the retrieved image–text pairs. Existing methods encode and retrieve images and texts, inputting the retrieved results into a language model to generate answers. These methods overlook the semantic alignment of image–text pairs within the information source, which affects the encoding and retrieval performance. Furthermore, these methods are highly dependent on retrieval performance, and poor retrieval quality can lead to poor generation performance. To address these issues, we propose a prompt-enhanced generation model, PEG, which includes generating supplementary descriptions for images to provide ample material for image–text alignment while also utilizing vision–language joint encoding to improve encoding effects and thereby enhance retrieval performance. Contrastive learning is used to enhance the model’s ability to discriminate between relevant and irrelevant information sources. Moreover, we further explore the knowledge within pre-trained model parameters through prefix-tuning to generate background knowledge relevant to the questions, offering additional input for answer generation and reducing the model’s dependency on retrieval performance. Experiments conducted on the WebQA and MultimodalQA datasets demonstrate that our model outperforms other baseline models in retrieval and generation performance.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Kulvinder Singh, Et al. „Enhancing Multimodal Information Retrieval Through Integrating Data Mining and Deep Learning Techniques“. International Journal on Recent and Innovation Trends in Computing and Communication 11, Nr. 9 (30.10.2023): 560–69. http://dx.doi.org/10.17762/ijritcc.v11i9.8844.

Der volle Inhalt der Quelle
Annotation:
Multimodal information retrieval, the task of re trieving relevant information from heterogeneous data sources such as text, images, and videos, has gained significant attention in recent years due to the proliferation of multimedia content on the internet. This paper proposes an approach to enhance multimodal information retrieval by integrating data mining and deep learning techniques. Traditional information retrieval systems often struggle to effectively handle multimodal data due to the inherent complexity and diversity of such data sources. In this study, we leverage data mining techniques to preprocess and structure multimodal data efficiently. Data mining methods enable us to extract valuable patterns, relationships, and features from different modalities, providing a solid foundation for sub- sequent retrieval tasks. To further enhance the performance of multimodal information retrieval, deep learning techniques are employed. Deep neural networks have demonstrated their effectiveness in various multimedia tasks, including image recognition, natural language processing, and video analysis. By integrating deep learning models into our retrieval framework, we aim to capture complex intermodal dependencies and semantically rich representations, enabling more accurate and context-aware retrieval.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

UbaidullahBokhari, Mohammad, und Faraz Hasan. „Multimodal Information Retrieval: Challenges and Future Trends“. International Journal of Computer Applications 74, Nr. 14 (26.07.2013): 9–12. http://dx.doi.org/10.5120/12951-9967.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Calumby, Rodrigo Tripodi. „Diversity-oriented Multimodal and Interactive Information Retrieval“. ACM SIGIR Forum 50, Nr. 1 (27.06.2016): 86. http://dx.doi.org/10.1145/2964797.2964811.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

S. Gomathy, K. P. Deepa, T. Revathi und L. Maria Michael Visuwasam. „Genre Specific Classification for Information Search and Multimodal Semantic Indexing for Data Retrieval“. SIJ Transactions on Computer Science Engineering & its Applications (CSEA) 01, Nr. 01 (05.04.2013): 10–15. http://dx.doi.org/10.9756/sijcsea/v1i1/01010159.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

ZHANG, Jing. „Video retrieval model based on multimodal information fusion“. Journal of Computer Applications 28, Nr. 1 (10.07.2008): 199–201. http://dx.doi.org/10.3724/sp.j.1087.2008.00199.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Mourão, André, Flávio Martins und João Magalhães. „Multimodal medical information retrieval with unsupervised rank fusion“. Computerized Medical Imaging and Graphics 39 (Januar 2015): 35–45. http://dx.doi.org/10.1016/j.compmedimag.2014.05.006.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Revuelta-Martínez, Alejandro, Luis Rodríguez, Ismael García-Varea und Francisco Montero. „Multimodal interaction for information retrieval using natural language“. Computer Standards & Interfaces 35, Nr. 5 (September 2013): 428–41. http://dx.doi.org/10.1016/j.csi.2012.11.002.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Chen, Xu, Alfred O. Hero, III und Silvio Savarese. „Multimodal Video Indexing and Retrieval Using Directed Information“. IEEE Transactions on Multimedia 14, Nr. 1 (Februar 2012): 3–16. http://dx.doi.org/10.1109/tmm.2011.2167223.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
11

Hubert, Gilles, und Josiane Mothe. „An adaptable search engine for multimodal information retrieval“. Journal of the American Society for Information Science and Technology 60, Nr. 8 (August 2009): 1625–34. http://dx.doi.org/10.1002/asi.21091.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
12

Cao, Yu, Shawn Steffey, Jianbiao He, Degui Xiao, Cui Tao, Ping Chen und Henning Müller. „Medical Image Retrieval: A Multimodal Approach“. Cancer Informatics 13s3 (Januar 2014): CIN.S14053. http://dx.doi.org/10.4137/cin.s14053.

Der volle Inhalt der Quelle
Annotation:
Medical imaging is becoming a vital component of war on cancer. Tremendous amounts of medical image data are captured and recorded in a digital format during cancer care and cancer research. Facing such an unprecedented volume of image data with heterogeneous image modalities, it is necessary to develop effective and efficient content-based medical image retrieval systems for cancer clinical practice and research. While substantial progress has been made in different areas of content-based image retrieval (CBIR) research, direct applications of existing CBIR techniques to the medical images produced unsatisfactory results, because of the unique characteristics of medical images. In this paper, we develop a new multimodal medical image retrieval approach based on the recent advances in the statistical graphic model and deep learning. Specifically, we first investigate a new extended probabilistic Latent Semantic Analysis model to integrate the visual and textual information from medical images to bridge the semantic gap. We then develop a new deep Boltzmann machine-based multimodal learning model to learn the joint density model from multimodal information in order to derive the missing modality. Experimental results with large volume of real-world medical images have shown that our new approach is a promising solution for the next-generation medical imaging indexing and retrieval system.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
13

Hu, Peng, Dezhong Peng, Xu Wang und Yong Xiang. „Multimodal adversarial network for cross-modal retrieval“. Knowledge-Based Systems 180 (September 2019): 38–50. http://dx.doi.org/10.1016/j.knosys.2019.05.017.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
14

Datta, Deepanwita, Shubham Varma, Ravindranath Chowdary C. und Sanjay K. Singh. „Multimodal Retrieval using Mutual Information based Textual Query Reformulation“. Expert Systems with Applications 68 (Februar 2017): 81–92. http://dx.doi.org/10.1016/j.eswa.2016.09.039.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
15

Imhof, Melanie, und Martin Braschler. „A study of untrained models for multimodal information retrieval“. Information Retrieval Journal 21, Nr. 1 (03.11.2017): 81–106. http://dx.doi.org/10.1007/s10791-017-9322-x.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
16

Soni, Ankita, und Richa Chouhan. „Multimodal Information Retrieval by using Visual and Textual Query“. International Journal of Computer Applications 137, Nr. 1 (17.03.2016): 6–10. http://dx.doi.org/10.5120/ijca2016908637.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
17

Sattari, Saeid, und Adnan Yazici. „Multimodal query-level fusion for efficient multimedia information retrieval“. International Journal of Intelligent Systems 33, Nr. 10 (31.05.2018): 2019–37. http://dx.doi.org/10.1002/int.21920.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
18

Vitay, Julien, und Fred H. Hamker. „Sustained Activities and Retrieval in a Computational Model of the Perirhinal Cortex“. Journal of Cognitive Neuroscience 20, Nr. 11 (November 2008): 1993–2005. http://dx.doi.org/10.1162/jocn.2008.20147.

Der volle Inhalt der Quelle
Annotation:
The perirhinal cortex is involved not only in object recognition and novelty detection but also in multimodal integration, reward association, and visual working memory. We propose a computational model that focuses on the role of the perirhinal cortex in working memory, particularly with respect to sustained activities and memory retrieval. This model describes how different partial informations are integrated into assemblies of neurons that represent the identity of an object. Through dopaminergic modulation, the resulting clusters can retrieve the global information with recurrent interactions between neurons. Dopamine leads to sustained activities after stimulus disappearance that form the basis of the involvement of the perirhinal cortex in visual working memory processes. The information carried by a cluster can also be retrieved by a partial thalamic or prefrontal stimulation. Thus, we suggest that areas involved in planning and memory coordination encode a pointer to access the detailed information encoded in the associative cortex such as the perirhinal cortex.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
19

Chávez, Ricardo Omar, Hugo Jair Escalante, Manuel Montes-y-Gómez und Luis Enrique Sucar. „Multimodal Markov Random Field for Image Reranking Based on Relevance Feedback“. ISRN Machine Vision 2013 (11.02.2013): 1–16. http://dx.doi.org/10.1155/2013/428746.

Der volle Inhalt der Quelle
Annotation:
This paper introduces a multimodal approach for reranking of image retrieval results based on relevance feedback. We consider the problem of reordering the ranked list of images returned by an image retrieval system, in such a way that relevant images to a query are moved to the first positions of the list. We propose a Markov random field (MRF) model that aims at classifying the images in the initial retrieval-result list as relevant or irrelevant; the output of the MRF is used to generate a new list of ranked images. The MRF takes into account (1) the rank information provided by the initial retrieval system, (2) similarities among images in the list, and (3) relevance feedback information. Hence, the problem of image reranking is reduced to that of minimizing an energy function that represents a trade-off between image relevance and interimage similarity. The proposed MRF is a multimodal as it can take advantage of both visual and textual information by which images are described with. We report experimental results in the IAPR TC12 collection using visual and textual features to represent images. Experimental results show that our method is able to improve the ranking provided by the base retrieval system. Also, the multimodal MRF outperforms unimodal (i.e., either text-based or image-based) MRFs that we have developed in previous work. Furthermore, the proposed MRF outperforms baseline multimodal methods that combine information from unimodal MRFs.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
20

Waykar, Sanjay B., und C. R. Bharathi. „Multimodal Features and Probability Extended Nearest Neighbor Classification for Content-Based Lecture Video Retrieval“. Journal of Intelligent Systems 26, Nr. 3 (26.07.2017): 585–99. http://dx.doi.org/10.1515/jisys-2016-0041.

Der volle Inhalt der Quelle
Annotation:
AbstractDue to the ever-increasing number of digital lecture libraries and lecture video portals, the challenge of retrieving lecture videos has become a very significant and demanding task in recent years. Accordingly, the literature presents different techniques for video retrieval by considering video contents as well as signal data. Here, we propose a lecture video retrieval system using multimodal features and probability extended nearest neighbor (PENN) classification. There are two modalities utilized for feature extraction. One is textual information, which is determined from the lecture video using optical character recognition. The second modality utilized to preserve video content is local vector pattern. These two modal features are extracted, and the retrieval of videos is performed using the proposed PENN classifier, which is the extension of the extended nearest neighbor classifier, by considering the different weightages for the first-level and second-level neighbors. The performance of the proposed video retrieval is evaluated using precision, recall, and F-measure, which are computed by matching the retrieved videos and the manually classified videos. From the experimentation, we proved that the average precision of the proposed PENN+VQ is 78.3%, which is higher than that of the existing methods.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
21

Wang, Xurui. „The application of NLP in information retrieval“. Applied and Computational Engineering 42, Nr. 1 (23.02.2024): 290–97. http://dx.doi.org/10.54254/2755-2721/42/20230795.

Der volle Inhalt der Quelle
Annotation:
The field of Natural Language Processing (NLP) has experienced impressive advancements and has found diverse applications. This paper presents a comprehensive review of the development of NLP in the field of information retrieval. It explores different stages of NLP techniques and methods, including keyword matching, rule-based approaches, statistical methods, and the utilization of machine learning and deep learning technologies. Furthermore, the paper provides detailed insights into the specific applications of NLP in domains such as academic information retrieval, medical information retrieval, travel information retrieval, and e-commerce information retrieval. It analyzes the current state of NLP applications in these domains, highlights their advantages, and discusses their associated limitations. Finally, the paper emphasizes the continuous advancement of the NLP field, with a particular focus on semantic understanding, personalized retrieval, and multimodal information retrieval, to better adapt to diverse data types and user requirements. The paper concludes by summarizing the main points discussed and providing future directions.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
22

Escalante, Hugo Jair, Manuel Montes und Enrique Sucar. „Multimodal indexing based on semantic cohesion for image retrieval“. Information Retrieval 15, Nr. 1 (05.06.2011): 1–32. http://dx.doi.org/10.1007/s10791-011-9170-z.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
23

Dong, Bin, Songlei Jian und Kai Lu. „Learning Multimodal Representations by Symmetrically Transferring Local Structures“. Symmetry 12, Nr. 9 (13.09.2020): 1504. http://dx.doi.org/10.3390/sym12091504.

Der volle Inhalt der Quelle
Annotation:
Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
24

Zhang, Guihao, und Jiangzhong Cao. „Feature Fusion Based on Transformer for Cross-modal Retrieval“. Journal of Physics: Conference Series 2558, Nr. 1 (01.08.2023): 012012. http://dx.doi.org/10.1088/1742-6596/2558/1/012012.

Der volle Inhalt der Quelle
Annotation:
Abstract With the popularity of the Internet and the rapid growth of multimodal data, multimodal retrieval has gradually become a hot area of research. As one of the important branches of multimodal retrieval, image-text retrieval aims to design a model to learn and align two modal data, image and text, in order to build a bridge of semantic association between the two heterogeneous data, so as to achieve unified alignment and retrieval. The current mainstream image-text cross-modal retrieval approaches have made good progress by designing a deep learning-based model to find potential associations between different modal data. In this paper, we design a transformer-based feature fusion network to fuse the information of two modalities in the feature extraction process, which can enrich the semantic connection between the modalities. Meanwhile, we conduct experiments on the benchmark dataset Flickr30k and get competitive results, where recall at 10 achieves 96.2% accuracy in image-to-text retrieval.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
25

Demner-Fushman, Dina, Sameer Antani, Matthew Simpson und George R. Thoma. „Design and Development of a Multimodal Biomedical Information Retrieval System“. Journal of Computing Science and Engineering 6, Nr. 2 (30.06.2012): 168–77. http://dx.doi.org/10.5626/jcse.2012.6.2.168.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
26

Liang, Qi, Ning Xu, Weijie Wang und Xingjian Long. „Multimodal information fusion based on LSTM for 3D model retrieval“. Multimedia Tools and Applications 79, Nr. 45-46 (11.04.2020): 33943–56. http://dx.doi.org/10.1007/s11042-020-08817-6.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
27

Zhang, Hongli. „Voice Keyword Retrieval Method Using Attention Mechanism and Multimodal Information Fusion“. Scientific Programming 2021 (23.01.2021): 1–11. http://dx.doi.org/10.1155/2021/6662841.

Der volle Inhalt der Quelle
Annotation:
A cross-modal speech-text retrieval method using interactive learning convolution automatic encoder (CAE) is proposed. First, an interactive learning autoencoder structure is proposed, including two inputs of speech and text, as well as processing links such as encoding, hidden layer interaction, and decoding, to complete the modeling of cross-modal speech-text retrieval. Then, the original audio signal is preprocessed and the Mel frequency cepstrum coefficient (MFCC) feature is extracted. In addition, the word bag model is used to extract the text features, and then the attention mechanism is used to combine the text and speech features. Through interactive learning CAE, the shared features of speech and text modes are obtained and then sent to modal classifier to identify modal information, so as to realize cross-modal voice text retrieval. Finally, experiments show that the performance of the proposed algorithm is better than that of the contrast algorithm in terms of recall rate, accuracy rate, and false recognition rate.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
28

He, Chao, Dalin Wang, Zefu Tan, Liming Xu und Nina Dai. „Cross-Modal Discrimination Hashing Retrieval Using Variable Length“. Security and Communication Networks 2022 (09.09.2022): 1–12. http://dx.doi.org/10.1155/2022/9638683.

Der volle Inhalt der Quelle
Annotation:
Fast cross-modal retrieval technology based on hash coding has become a hot topic for the rich multimodal data (text, image, audio, etc.), especially security and privacy challenges in the Internet of Things and mobile edge computing. However, most methods based on hash coding are only mapped to the common hash coding space, and it relaxes the two value constraints of hash coding. Therefore, the learning of the multimodal hash coding may not be sufficient and effective to express the original multimodal data and cause the hash encoding category to be less discriminatory. For the sake of solving these problems, this paper proposes a method of mapping each modal data to the optimal length of hash coding space, respectively, and then the hash encoding of each modal data is solved by the discrete cross-modal hash algorithm of two value constraints. Finally, the similarity of multimodal data is compared in the potential space. The experimental results of the cross-model retrieval based on variable hash coding are better than that of the relative comparison methods in the WIKI data set, NUS-WIDE data set, as well as MIRFlickr data set, and the method we proposed is proved to be feasible and effective.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
29

Mithun, Niluthpol C., Juncheng Li, Florian Metze und Amit K. Roy-Chowdhury. „Joint embeddings with multimodal cues for video-text retrieval“. International Journal of Multimedia Information Retrieval 8, Nr. 1 (12.01.2019): 3–18. http://dx.doi.org/10.1007/s13735-018-00166-3.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
30

Gonsior, Barbara, Christian Landsiedel, Nicole Mirnig, Stefan Sosnowski, Ewald Strasser, Jakub Złotowski, Martin Buss et al. „Impacts of Multimodal Feedback on Efficiency of Proactive Information Retrieval from Task-Related HRI“. Journal of Advanced Computational Intelligence and Intelligent Informatics 16, Nr. 2 (20.03.2012): 313–26. http://dx.doi.org/10.20965/jaciii.2012.p0313.

Der volle Inhalt der Quelle
Annotation:
This work is a first step towards an integration ofmultimodality with the aim to make efficient use of both human-like, and non-human-like feedback modalities in order to optimize proactive information retrieval from task-related Human-Robot Interaction (HRI) in human environments. The presented approach combines the human-like modalities speech and emotional facial mimicry with non-human-like modalities. The proposed non-human-like modalities are a screen displaying retrieved knowledge of the robot to the human and a pointer mounted above the robot head for pointing directions and referring to objects in shared visual space as an equivalent for arm and hand gestures. Initially, pre-interaction feedback is explored in an experiment investigating different approach behaviors in order to find socially acceptable trajectories to increase the success of interactions and thus efficiency of information retrieval. Secondly, pre-evaluated humanlike modalities are introduced. First results of a multimodal feedback study are presented in the context of the IURO project,1where a robot asks for its way to a predefined goal location.1. Interactive Urban Robot, http://www.iuro-project.eu
APA, Harvard, Vancouver, ISO und andere Zitierweisen
31

Figueroa, Cristhian, Hugo Ordoñez, Juan-Carlos Corrales, Carlos Cobos, Leandro Krug Wives und Enrique Herrera-Viedma. „Improving business process retrieval using categorization and multimodal search“. Knowledge-Based Systems 110 (Oktober 2016): 49–59. http://dx.doi.org/10.1016/j.knosys.2016.07.014.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
32

Boughanem, M., C. Chrisment und L. Tamine. „On using genetic algorithms for multimodal relevance optimization in information retrieval“. Journal of the American Society for Information Science and Technology 53, Nr. 11 (2002): 934–42. http://dx.doi.org/10.1002/asi.10119.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
33

Rahman, Md Mahmudur, Daekeun You, Matthew S. Simpson, Sameer K. Antani, Dina Demner-Fushman und George R. Thoma. „Multimodal biomedical image retrieval using hierarchical classification and modality fusion“. International Journal of Multimedia Information Retrieval 2, Nr. 3 (04.07.2013): 159–73. http://dx.doi.org/10.1007/s13735-013-0038-4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
34

Wang, Qiang, Wei Zheng, Fan Wu, Huizhong Zhu, Aigong Xu, Yifan Shen und Yelong Zhao. „Information Fusion for Spaceborne GNSS-R Sea Surface Height Retrieval Using Modified Residual Multimodal Deep Learning Method“. Remote Sensing 15, Nr. 6 (07.03.2023): 1481. http://dx.doi.org/10.3390/rs15061481.

Der volle Inhalt der Quelle
Annotation:
Traditional spaceborne Global Navigation Satellite Systems Reflectometry (GNSS-R) sea surface height (SSH) retrieval methods have the disadvantages of complicated error models, low retrieval accuracy, and difficulty using full DDM information. To compensate for these deficiencies while considering the heterogeneity of the input data, this paper proposes an end-to-end Modified Residual Multimodal Deep Learning (MRMDL) method that can utilize the entire range of DDM information. First, the MRMDL method is constructed based on the modified Residual Net (MResNet) and Multi-Hidden layer neural network (MHL-NN). The MResNet applicable to DDM structures is used to adaptively capture productive features of the full DDM and to convert the two-dimensional DDM data into one-dimensional numerical form. Then, the extracted features and auxiliary parameters are fused as the input data for MHL-NN to retrieve the SSH. Second, the reliability of the model is verified using SSH with tide-corrected DTU Sea Surface Height 18 (DTU18) and spaceborne radar altimeters (Jason3, HY-2C, HY-2B). Compared to the SSH provided by the DTU18 validation model and the spaceborne radar altimeter, the Pearson correlation coefficients (PCC) are 0.98 and 0.97, respectively. However, the CYGNSS satellite is not primarily employed for ocean altimetry, and the mean absolute differences (MAD) are 3.92 m and 4.32 m, respectively. Finally, the retrieval accuracy of the MRMDL method and the HALF retracking approach are compared and analyzed. Finally, this study also implements the HALF retracking algorithm to derive the SSH, and the results are compared with those computed by the MRMDL method. The MRMDL method is more accurate than the HALF retracking approach according to MAD, Root-Mean-Square Error (RMSE), and PCC, with an improvement of 35.21%, 17.25%, and 2.08%, respectively. The MRMDL method will contribute a new theoretical and methodological reference for future GNSS-R altimetry satellites with high spatiotemporal SSH retrieval.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
35

Li, Ruxuan, Jingyi Wang und Xuedong Tian. „A Multi-Modal Retrieval Model for Mathematical Expressions Based on ConvNeXt and Hesitant Fuzzy Set“. Electronics 12, Nr. 20 (20.10.2023): 4363. http://dx.doi.org/10.3390/electronics12204363.

Der volle Inhalt der Quelle
Annotation:
Mathematical expression retrieval is an essential component of mathematical information retrieval. Current mathematical expression retrieval research primarily targets single modalities, particularly text, which can lead to the loss of structural information. On the other hand, multimodal research has demonstrated promising outcomes across different domains, and mathematical expressions in image format are adept at preserving their structural characteristics. So we propose a multi-modal retrieval model for mathematical expressions based on ConvNeXt and HFS to address the limitations of single-modal retrieval. For the image modal, mathematical expression retrieval is based on the similarity of image features and symbol-level features of the expression, where image features of the expression image are extracted by ConvNeXt, while symbol-level features are obtained by the Symbol Level Features Extraction (SLFE) module. For the text modal, the Formula Description Structure (FDS) is employed to analyze expressions and extract their attributes. Additionally, the application of the Hesitant Fuzzy Set (HFS) theory facilitates the computation of hesitant fuzzy similarity between mathematical queries and candidate expressions. Finally, Reciprocal Rank Fusion (RRF) is employed to integrate rankings from image modal and text modal retrieval, yielding the ultimate retrieval list. The experiment was conducted on the publicly accessible ArXiv dataset (containing 592,345 mathematical expressions) and the NTCIR-mair-wikipedia-corpus (NTCIR) dataset.The MAP@10 values for the multimodal RRF fusion approach are recorded as 0.774. These substantiate the efficacy of the multi-modal mathematical expression retrieval approach based on ConvNeXt and HFS.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
36

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang und Heng Tao Shen. „Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval“. Proceedings of the AAAI Conference on Artificial Intelligence 34, Nr. 07 (03.04.2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Der volle Inhalt der Quelle
Annotation:
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
37

Qian, Shengsheng, Dizhan Xue, Huaiwen Zhang, Quan Fang und Changsheng Xu. „Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 3 (18.05.2021): 2440–48. http://dx.doi.org/10.1609/aaai.v35i3.16345.

Der volle Inhalt der Quelle
Annotation:
Cross-modal retrieval has become an active study field with the expanding scale of multimodal data. To date, most existing methods transform multimodal data into a common representation space where semantic similarities between items can be directly measured across different modalities. However, these methods typically suffer from following limitations: 1) They usually attempt to bridge the modality gap by designing losses in the common representation space which may not be sufficient to eliminate potential heterogeneity of different modalities in the common space. 2) They typically treat labels as independent individuals and ignore label relationships which are important for constructing semantic links between multimodal data. In this work, we propose a novel Dual Adversarial Graph Neural Networks (DAGNN) composed of the dual generative adversarial networks and the multi-hop graph neural networks, which learn modality-invariant and discriminative common representations for cross-modal retrieval. Firstly, we construct the dual generative adversarial networks to project multimodal data into a common representation space. Secondly, we leverage the multi-hop graph neural networks, in which a layer aggregation mechanism is proposed to exploit multi-hop propagation information, to capture the label correlation dependency and learn inter-dependent classifiers. Comprehensive experiments conducted on two cross-modal retrieval benchmark datasets, NUS-WIDE and MIRFlickr, indicate the superiority of DAGNN.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
38

Liljedahl, Mats, Stefan Lindberg, Katarina Delsing, Mikko Polojärvi, Timo Saloranta und Ismo Alakärppä. „Testing Two Tools for Multimodal Navigation“. Advances in Human-Computer Interaction 2012 (2012): 1–10. http://dx.doi.org/10.1155/2012/251384.

Der volle Inhalt der Quelle
Annotation:
The latest smartphones with GPS, electronic compasses, directional audio, touch screens, and so forth, hold a potential for location-based services that are easier to use and that let users focus on their activities and the environment around them. Rather than interpreting maps, users can search for information by pointing in a direction and database queries can be created from GPS location and compass data. Users can also get guidance to locations through point and sweep gestures, spatial sound, and simple graphics. This paper describes two studies testing two applications with multimodal user interfaces for navigation and information retrieval. The applications allow users to search for information and get navigation support using combinations of point and sweep gestures, nonspeech audio, graphics, and text. Tests show that users appreciated both applications for their ease of use and for allowing users to interact directly with the surrounding environment.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
39

Zhu, Qing, Junxiao Zhang, Yulin Ding, Mingwei Liu, Yun Li, Bin Feng, Shuangxi Miao, Weijun Yang, Huagui He und Jun Zhu. „Semantics-Constrained Advantageous Information Selection of Multimodal Spatiotemporal Data for Landslide Disaster Assessment“. ISPRS International Journal of Geo-Information 8, Nr. 2 (30.01.2019): 68. http://dx.doi.org/10.3390/ijgi8020068.

Der volle Inhalt der Quelle
Annotation:
Although abundant spatiotemporal data are collected before and after landslides, the volume, variety, intercorrelation, and heterogeneity of multimodal data complicates disaster assessments, so it is challenging to select information from multimodal spatiotemporal data that is advantageous for credible and comprehensive disaster assessment. In disaster scenarios, multimodal data exhibit intrinsic relationships, and their interactions can greatly influence selection results. Previous data retrieval methods have mainly focused on candidate ranking while ignoring the generation and evaluation of candidate subsets. In this paper, a semantic-constrained data selection approach is proposed. First, multitype relationships are defined and reasoned through the heterogeneous information network. Then, relevance, redundancy, and complementarity are redefined to evaluate data sets in terms of semantic proximity and similarity. Finally, the approach is tested using Mao County (China) landslide data. The proposed method can automatically and effectively generate suitable datasets for certain tasks rather than simply ranking by similarity, and the selection results are compared with manual results to verify their effectiveness.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
40

Simpson, Matthew S., Dina Demner-Fushman, Sameer K. Antani und George R. Thoma. „Multimodal biomedical image indexing and retrieval using descriptive text and global feature mapping“. Information Retrieval 17, Nr. 3 (13.11.2013): 229–64. http://dx.doi.org/10.1007/s10791-013-9235-2.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
41

Díaz-Galiano, M. C., M. T. Martín-Valdivia und L. A. Ureña-López. „Query expansion with a medical ontology to improve a multimodal information retrieval system“. Computers in Biology and Medicine 39, Nr. 4 (April 2009): 396–403. http://dx.doi.org/10.1016/j.compbiomed.2009.01.012.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
42

Bouslimi, Riadh, Mouhamed Gaith Ayadi und Jalel Akaichi. „Medical Image Retrieval in Healthcare Social Networks“. International Journal of Healthcare Information Systems and Informatics 13, Nr. 2 (April 2018): 13–28. http://dx.doi.org/10.4018/ijhisi.2018040102.

Der volle Inhalt der Quelle
Annotation:
In this article, the authors present a multimodal research model to research medical images based on multimedia information that is extracted from a radiological collaborative social network. The opinions shared on a medical image in a medico-social network is a textual description which in most cases requires cleaning by using a medical thesaurus. In addition, they describe the textual description and medical image in a TF-IDF weight vector using a “bag-of-words” approach. The authors then use latent semantic analysis to establish relationships between textual terms and visual terms in shared opinions on the medical image. The model is evaluated against the ImageCLEFmedbaseline, which is the ground truth for the experiments. The authors have conducted numerous experiments with different descriptors and many combinations of modalities. The analysis of results shows that when the model is based on two methods it can increase the performance of a research system based on a single modality both visually or textually.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
43

Hu, Yue. „Music Emotion Research Based on Reinforcement Learning and Multimodal Information“. Journal of Mathematics 2022 (09.02.2022): 1–9. http://dx.doi.org/10.1155/2022/2446399.

Der volle Inhalt der Quelle
Annotation:
Music is an important carrier of emotion and an indispensable factor in people’s daily life. With the rapid growth of digital music, people’s demand for music emotion analysis and retrieval is also increasing. With the rapid development of Internet technology, digital music has been derived continuously, and automatic recognition of music emotion has become the main research focus. For music, emotion is the most essential feature and the deepest inner feeling. Under the ubiquitous information environment, revealing the deep semantic information of multimodal information resources and providing users with integrated information services has important research and application value. In this paper, a multimodal fusion algorithm for music emotion analysis is proposed, and a dynamic model based on reinforcement learning is constructed to improve the analysis accuracy. The model dynamically adjusts the emotional analysis results by learning the user’s behavior, so as to realize the personalized customization of the user’s emotional preference.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
44

Yan, Meichao, Yu Wen, Qingxuan Shi und Xuedong Tian. „A Multimodal Retrieval and Ranking Method for Scientific Documents Based on HFS and XLNet“. Scientific Programming 2022 (04.01.2022): 1–11. http://dx.doi.org/10.1155/2022/5373531.

Der volle Inhalt der Quelle
Annotation:
Aiming at the defects of traditional full-text retrieval models in dealing with mathematical expressions, which are special objects different from ordinary texts, a multimodal retrieval and ranking method for scientific documents based on hesitant fuzzy sets (HFS) and XLNet is proposed. This method integrates multimodal information, such as mathematical expression images and context text, as keywords to realize the retrieval of scientific documents. In the image modal, the images of mathematical expressions are recognized, and the hesitancy fuzzy set theory is introduced to calculate the hesitancy fuzzy similarity between mathematical query expressions and the mathematical expressions in candidate scientific documents. Meanwhile, in the text mode, XLNet is used to generate word vectors of the mathematical expression context to obtain the similarity between the query text and the mathematical expression context of the candidate scientific documents. Finally, the multimodal evaluation is integrated, and the hesitation fuzzy set is constructed at the document level to obtain the final scores of the scientific documents and corresponding ranked output. The experimental results show that the recall and precision of this method are 0.774 and 0.663 on the NTCIR dataset, respectively, and the average normalized discounted cumulative gain (NDCG) value of the top-10 ranking results is 0.880 on the Chinese scientific document (CSD) dataset.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
45

Zhen, Yi, Yue Gao, Dit-Yan Yeung, Hongyuan Zha und Xuelong Li. „Spectral Multimodal Hashing and Its Application to Multimedia Retrieval“. IEEE Transactions on Cybernetics 46, Nr. 1 (Januar 2016): 27–38. http://dx.doi.org/10.1109/tcyb.2015.2392052.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
46

Liu, Li, Xiao Dong und Tianshi Wang. „Semi-Supervised Cross-Modal Retrieval Based on Discriminative Comapping“. Complexity 2020 (18.07.2020): 1–13. http://dx.doi.org/10.1155/2020/1462429.

Der volle Inhalt der Quelle
Annotation:
Most cross-modal retrieval methods based on subspace learning just focus on learning the projection matrices that map different modalities to a common subspace and pay less attention to the retrieval task specificity and class information. To address the two limitations and make full use of unlabelled data, we propose a novel semi-supervised method for cross-modal retrieval named modal-related retrieval based on discriminative comapping (MRRDC). The projection matrices are obtained to map multimodal data into a common subspace for different tasks. In the process of projection matrix learning, a linear discriminant constraint is introduced to preserve the original class information in different modal spaces. An iterative optimization algorithm based on label propagation is presented to solve the proposed joint learning formulations. The experimental results on several datasets demonstrate the superiority of our method compared with state-of-the-art subspace methods.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
47

Ackerley, Katherine, und Francesca Coccetta. „Enriching language learning through a multimedia corpus“. ReCALL 19, Nr. 3 (24.08.2007): 351–70. http://dx.doi.org/10.1017/s0958344007000730.

Der volle Inhalt der Quelle
Annotation:
AbstractUntil recently, use has been made almost exclusively of text-based concordancers in the analysis of spoken corpora. This article discusses research being carried out on Padua University's Multimedia English Corpus (Padova MEC) using the multimodal concordancer MCA (Multimodal Corpus Authoring System, Baldry, 2005). This highly innovative concordancer enables the retrieval of parts of video and audio from a tagged corpus and access to examples of language in context, thereby providing non-verbal information about the environment, the participants and their moods, details that can be gleaned from a combination of word, sound, image and movement. This is of use to language learners of all levels because if “communication is to be successful, a relevant context has to be constructed by the discourse participants” (Braun, 2005: 52). In other words, transcripts alone are not sufficient if learners are to have anything like participant knowledge and comprehend spoken language. In the article it will be demonstrated how language functions expressed in the multimedia corpus of spoken English are retrieved using MCA. Online learning materials based on the multimodal concordances take into consideration not only language, but also the way in which it co-patterns with other semiotic resources, thereby raising the issue of the importance of learner awareness of the multimodal nature of communication.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
48

Sayad, IsmailEl, SamihAbdul Nabi, Hussien Kassem, Georges Moubarak und Ahmad Saleh. „A NEW WEIGHTING SCHEME FOR CONTENT-BASED IMAGE RETRIEVAL IN THE MULTIMODAL INFORMATION SPACES.“ International Journal of Advanced Research 6, Nr. 7 (31.07.2018): 104–14. http://dx.doi.org/10.21474/ijar01/7342.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
49

Chu, Hanlu, Haien Zeng, Hanjiang Lai und Yong Tang. „Efficient modal-aware feature learning with application in multimodal hashing“. Intelligent Data Analysis 26, Nr. 2 (14.03.2022): 345–60. http://dx.doi.org/10.3233/ida-215780.

Der volle Inhalt der Quelle
Annotation:
Many retrieval applications can benefit from multiple modalities, for which how to represent multimodal data is the critical component. Most deep multimodal learning methods typically involve two steps to construct the joint representations: 1) learning of multiple intermediate features, with each intermediate feature corresponding to a modality, using separate and independent deep models; 2) merging the intermediate features into a joint representation using a fusion strategy. However, in the first step, these intermediate features do not have previous knowledge of each other and cannot fully exploit the information contained in the other modalities. In this paper, we present a modal-aware operation as a generic building block to capture the non-linear dependencies among the heterogeneous intermediate features, which can learn the underlying correlation structures in other multimodal data as soon as possible. The modal-aware operation consists of a kernel network and an attention network. The kernel network is utilized to learn the non-linear relationships with other modalities. The attention network finds the informative regions of these modal-aware features that are favorable for retrieval. We verify the proposed modal-aware feature learning in the multimodal hashing task. The experiments conducted on three public benchmark datasets demonstrate significant improvements in the performance of our method relative to state-of-the-art methods.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
50

Qi, Yudan, und Huaxiang Zhang. „Joint Graph Regularization in a Homogeneous Subspace for Cross-Media Retrieval“. Journal of Advanced Computational Intelligence and Intelligent Informatics 23, Nr. 5 (20.09.2019): 939–46. http://dx.doi.org/10.20965/jaciii.2019.p0939.

Der volle Inhalt der Quelle
Annotation:
The heterogeneity of multimodal data is the main challenge in cross-media retrieval; many methods have already been developed to address the problem. At present, subspace learning is one of the mainstream approaches for cross-media retrieval; its aim is to learn a latent shared subspace so that similarities within cross-modal data can be measured in this subspace. However, most existing subspace learning algorithms only focus on supervised information, using labeled data for training to obtain one pair of mapping matrices. In this paper, we propose joint graph regularization based on semi-supervised learning cross-media retrieval (JGRHS), which makes full use of labeled and unlabeled data. We jointly considered correlation analysis and semantic information when learning projection matrices to maintain the closeness of pairwise data and semantic consistency; graph regularization is used to make learned transformation consistent with similarity constraints in both modalities. In addition, the retrieval results on three datasets indicate that the proposed method achieves good efficiency in theoretical research and practical applications.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie