Artículos de revistas sobre el tema "Multimodal retrieval"

Siga este enlace para ver otros tipos de publicaciones sobre el tema: Multimodal retrieval.

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Multimodal retrieval".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Cui, Chenhao y Zhoujun Li. "Prompt-Enhanced Generation for Multimodal Open Question Answering". Electronics 13, n.º 8 (10 de abril de 2024): 1434. http://dx.doi.org/10.3390/electronics13081434.

Texto completo
Resumen
Multimodal open question answering involves retrieving relevant information from both images and their corresponding texts given a question and then generating the answer. The quality of the generated answer heavily depends on the quality of the retrieved image–text pairs. Existing methods encode and retrieve images and texts, inputting the retrieved results into a language model to generate answers. These methods overlook the semantic alignment of image–text pairs within the information source, which affects the encoding and retrieval performance. Furthermore, these methods are highly dependent on retrieval performance, and poor retrieval quality can lead to poor generation performance. To address these issues, we propose a prompt-enhanced generation model, PEG, which includes generating supplementary descriptions for images to provide ample material for image–text alignment while also utilizing vision–language joint encoding to improve encoding effects and thereby enhance retrieval performance. Contrastive learning is used to enhance the model’s ability to discriminate between relevant and irrelevant information sources. Moreover, we further explore the knowledge within pre-trained model parameters through prefix-tuning to generate background knowledge relevant to the questions, offering additional input for answer generation and reducing the model’s dependency on retrieval performance. Experiments conducted on the WebQA and MultimodalQA datasets demonstrate that our model outperforms other baseline models in retrieval and generation performance.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Xu, Hong. "Multimodal bird information retrieval system". Applied and Computational Engineering 53, n.º 1 (28 de marzo de 2024): 96–102. http://dx.doi.org/10.54254/2755-2721/53/20241282.

Texto completo
Resumen
Multimodal bird information retrieval system can help people popularize bird knowledge and help bird conservation. In this paper, we use the self-built bird dataset, the ViT-B/32 model in CLIP model as the training model, python as the development language, and PyQT5 to complete the interface development. The system mainly realizes the uploading and displaying of bird pictures, the multimodal retrieval function of bird information, and the introduction of related bird information. The results of the trial run show that the system can accomplish the multimodal retrieval of bird information, retrieve the species of birds and other related information through the pictures uploaded by the user, or retrieve the most similar bird information through the text content described by the user.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Romberg, Stefan, Rainer Lienhart y Eva Hörster. "Multimodal Image Retrieval". International Journal of Multimedia Information Retrieval 1, n.º 1 (7 de marzo de 2012): 31–44. http://dx.doi.org/10.1007/s13735-012-0006-4.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Kitanovski, Ivan, Gjorgji Strezoski, Ivica Dimitrovski, Gjorgji Madjarov y Suzana Loskovska. "Multimodal medical image retrieval system". Multimedia Tools and Applications 76, n.º 2 (25 de enero de 2016): 2955–78. http://dx.doi.org/10.1007/s11042-016-3261-1.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Kulvinder Singh, Et al. "Enhancing Multimodal Information Retrieval Through Integrating Data Mining and Deep Learning Techniques". International Journal on Recent and Innovation Trends in Computing and Communication 11, n.º 9 (30 de octubre de 2023): 560–69. http://dx.doi.org/10.17762/ijritcc.v11i9.8844.

Texto completo
Resumen
Multimodal information retrieval, the task of re trieving relevant information from heterogeneous data sources such as text, images, and videos, has gained significant attention in recent years due to the proliferation of multimedia content on the internet. This paper proposes an approach to enhance multimodal information retrieval by integrating data mining and deep learning techniques. Traditional information retrieval systems often struggle to effectively handle multimodal data due to the inherent complexity and diversity of such data sources. In this study, we leverage data mining techniques to preprocess and structure multimodal data efficiently. Data mining methods enable us to extract valuable patterns, relationships, and features from different modalities, providing a solid foundation for sub- sequent retrieval tasks. To further enhance the performance of multimodal information retrieval, deep learning techniques are employed. Deep neural networks have demonstrated their effectiveness in various multimedia tasks, including image recognition, natural language processing, and video analysis. By integrating deep learning models into our retrieval framework, we aim to capture complex intermodal dependencies and semantically rich representations, enabling more accurate and context-aware retrieval.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Cao, Yu, Shawn Steffey, Jianbiao He, Degui Xiao, Cui Tao, Ping Chen y Henning Müller. "Medical Image Retrieval: A Multimodal Approach". Cancer Informatics 13s3 (enero de 2014): CIN.S14053. http://dx.doi.org/10.4137/cin.s14053.

Texto completo
Resumen
Medical imaging is becoming a vital component of war on cancer. Tremendous amounts of medical image data are captured and recorded in a digital format during cancer care and cancer research. Facing such an unprecedented volume of image data with heterogeneous image modalities, it is necessary to develop effective and efficient content-based medical image retrieval systems for cancer clinical practice and research. While substantial progress has been made in different areas of content-based image retrieval (CBIR) research, direct applications of existing CBIR techniques to the medical images produced unsatisfactory results, because of the unique characteristics of medical images. In this paper, we develop a new multimodal medical image retrieval approach based on the recent advances in the statistical graphic model and deep learning. Specifically, we first investigate a new extended probabilistic Latent Semantic Analysis model to integrate the visual and textual information from medical images to bridge the semantic gap. We then develop a new deep Boltzmann machine-based multimodal learning model to learn the joint density model from multimodal information in order to derive the missing modality. Experimental results with large volume of real-world medical images have shown that our new approach is a promising solution for the next-generation medical imaging indexing and retrieval system.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Rafailidis, D., S. Manolopoulou y P. Daras. "A unified framework for multimodal retrieval". Pattern Recognition 46, n.º 12 (diciembre de 2013): 3358–70. http://dx.doi.org/10.1016/j.patcog.2013.05.023.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Dong, Bin, Songlei Jian y Kai Lu. "Learning Multimodal Representations by Symmetrically Transferring Local Structures". Symmetry 12, n.º 9 (13 de septiembre de 2020): 1504. http://dx.doi.org/10.3390/sym12091504.

Texto completo
Resumen
Multimodal representations play an important role in multimodal learning tasks, including cross-modal retrieval and intra-modal clustering. However, existing multimodal representation learning approaches focus on building one common space by aligning different modalities and ignore the complementary information across the modalities, such as the intra-modal local structures. In other words, they only focus on the object-level alignment and ignore structure-level alignment. To tackle the problem, we propose a novel symmetric multimodal representation learning framework by transferring local structures across different modalities, namely MTLS. A customized soft metric learning strategy and an iterative parameter learning process are designed to symmetrically transfer local structures and enhance the cluster structures in intra-modal representations. The bidirectional retrieval loss based on multi-layer neural networks is utilized to align two modalities. MTLS is instantiated with image and text data and shows its superior performance on image-text retrieval and image clustering. MTLS outperforms the state-of-the-art multimodal learning methods by up to 32% in terms of R@1 on text-image retrieval and 16.4% in terms of AMI onclustering.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Zhang, Guihao y Jiangzhong Cao. "Feature Fusion Based on Transformer for Cross-modal Retrieval". Journal of Physics: Conference Series 2558, n.º 1 (1 de agosto de 2023): 012012. http://dx.doi.org/10.1088/1742-6596/2558/1/012012.

Texto completo
Resumen
Abstract With the popularity of the Internet and the rapid growth of multimodal data, multimodal retrieval has gradually become a hot area of research. As one of the important branches of multimodal retrieval, image-text retrieval aims to design a model to learn and align two modal data, image and text, in order to build a bridge of semantic association between the two heterogeneous data, so as to achieve unified alignment and retrieval. The current mainstream image-text cross-modal retrieval approaches have made good progress by designing a deep learning-based model to find potential associations between different modal data. In this paper, we design a transformer-based feature fusion network to fuse the information of two modalities in the feature extraction process, which can enrich the semantic connection between the modalities. Meanwhile, we conduct experiments on the benchmark dataset Flickr30k and get competitive results, where recall at 10 achieves 96.2% accuracy in image-to-text retrieval.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Kompus, Kristiina, Tom Eichele, Kenneth Hugdahl y Lars Nyberg. "Multimodal Imaging of Incidental Retrieval: The Low Route to Memory". Journal of Cognitive Neuroscience 23, n.º 4 (abril de 2011): 947–60. http://dx.doi.org/10.1162/jocn.2010.21494.

Texto completo
Resumen
Memories of past episodes frequently come to mind incidentally, without directed search. It has remained unclear how incidental retrieval processes are initiated in the brain. Here we used fMRI and ERP recordings to find brain activity that specifically correlates with incidental retrieval, as compared to intentional retrieval. Intentional retrieval was associated with increased activation in dorsolateral prefrontal cortex. By contrast, incidental retrieval was associated with a reduced fMRI signal in posterior brain regions, including extrastriate and parahippocampal cortex, and a modulation of a posterior ERP component 170 msec after the onset of visual retrieval cues. Successful retrieval under both intentional and incidental conditions was associated with increased activation in the hippocampus, precuneus, and ventrolateral prefrontal cortex, as well as increased amplitude of the P600 ERP component. These results demonstrate how early bottom–up signals from posterior cortex can lead to reactivation of episodic memories in the absence of strategic retrieval attempts.
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

UbaidullahBokhari, Mohammad y Faraz Hasan. "Multimodal Information Retrieval: Challenges and Future Trends". International Journal of Computer Applications 74, n.º 14 (26 de julio de 2013): 9–12. http://dx.doi.org/10.5120/12951-9967.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Yamaguchi, Masataka. "2. Multimodal Retrieval between Vision and Language". Journal of The Institute of Image Information and Television Engineers 72, n.º 9 (2018): 655–58. http://dx.doi.org/10.3169/itej.72.655.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Calumby, Rodrigo Tripodi. "Diversity-oriented Multimodal and Interactive Information Retrieval". ACM SIGIR Forum 50, n.º 1 (27 de junio de 2016): 86. http://dx.doi.org/10.1145/2964797.2964811.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Jin, Lu, Kai Li, Hao Hu, Guo-Jun Qi y Jinhui Tang. "Semantic Neighbor Graph Hashing for Multimodal Retrieval". IEEE Transactions on Image Processing 27, n.º 3 (marzo de 2018): 1405–17. http://dx.doi.org/10.1109/tip.2017.2776745.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Peng, Yang, Xiaofeng Zhou, Daisy Zhe Wang, Ishan Patwa, Dihong Gong y Chunsheng Victor Fang. "Multimodal Ensemble Fusion for Disambiguation and Retrieval". IEEE MultiMedia 23, n.º 2 (abril de 2016): 42–52. http://dx.doi.org/10.1109/mmul.2016.26.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Hu, Peng, Dezhong Peng, Xu Wang y Yong Xiang. "Multimodal adversarial network for cross-modal retrieval". Knowledge-Based Systems 180 (septiembre de 2019): 38–50. http://dx.doi.org/10.1016/j.knosys.2019.05.017.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Waykar, Sanjay B. y C. R. Bharathi. "Multimodal Features and Probability Extended Nearest Neighbor Classification for Content-Based Lecture Video Retrieval". Journal of Intelligent Systems 26, n.º 3 (26 de julio de 2017): 585–99. http://dx.doi.org/10.1515/jisys-2016-0041.

Texto completo
Resumen
AbstractDue to the ever-increasing number of digital lecture libraries and lecture video portals, the challenge of retrieving lecture videos has become a very significant and demanding task in recent years. Accordingly, the literature presents different techniques for video retrieval by considering video contents as well as signal data. Here, we propose a lecture video retrieval system using multimodal features and probability extended nearest neighbor (PENN) classification. There are two modalities utilized for feature extraction. One is textual information, which is determined from the lecture video using optical character recognition. The second modality utilized to preserve video content is local vector pattern. These two modal features are extracted, and the retrieval of videos is performed using the proposed PENN classifier, which is the extension of the extended nearest neighbor classifier, by considering the different weightages for the first-level and second-level neighbors. The performance of the proposed video retrieval is evaluated using precision, recall, and F-measure, which are computed by matching the retrieved videos and the manually classified videos. From the experimentation, we proved that the average precision of the proposed PENN+VQ is 78.3%, which is higher than that of the existing methods.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Rebstock, Alicia M. y Sarah E. Wallace. "Effects of a Combined Semantic Feature Analysis and Multimodal Treatment for Primary Progressive Aphasia: Pilot Study". Communication Disorders Quarterly 41, n.º 2 (10 de septiembre de 2018): 71–85. http://dx.doi.org/10.1177/1525740118794399.

Texto completo
Resumen
Primary progressive aphasia (PPA) is a neurodegenerative condition characterized by language and cognitive decline. Word-retrieval deficits are the most common PPA symptom and contribute to impaired spoken expression. Intense semantic interventions show promise for improving word retrieval in people with PPA. In addition, people with PPA may learn to use alternative communication modalities when they are unable to retrieve a word. However, executive function impairments can cause people to struggle to switch among modalities to repair communication breakdowns.This study examined the effects of a combined semantic feature analysis and multimodal communication program (SFA+MCP) on word-retrieval accuracy, switching among modalities, and overall communicative effectiveness in a person with PPA. An adult female with PPA completed SFA+MCP. Baseline, probe, intervention, and postintervention sessions were completed to measure word-retrieval accuracy and switching between communication modalities. A postintervention listener task was completed to measure communicative effectiveness. Changes in word-retrieval accuracy and switching were minimal. However, the listeners’ identification of the participant’s communication attempts was more accurate following treatment, suggesting increased overall communicative effectiveness. Further investigations of SFA+MCP, specifically relative to timing, intensity, and appropriate modifications for people with cognitive impairments associated with PPA are warranted.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

He, Chao, Dalin Wang, Zefu Tan, Liming Xu y Nina Dai. "Cross-Modal Discrimination Hashing Retrieval Using Variable Length". Security and Communication Networks 2022 (9 de septiembre de 2022): 1–12. http://dx.doi.org/10.1155/2022/9638683.

Texto completo
Resumen
Fast cross-modal retrieval technology based on hash coding has become a hot topic for the rich multimodal data (text, image, audio, etc.), especially security and privacy challenges in the Internet of Things and mobile edge computing. However, most methods based on hash coding are only mapped to the common hash coding space, and it relaxes the two value constraints of hash coding. Therefore, the learning of the multimodal hash coding may not be sufficient and effective to express the original multimodal data and cause the hash encoding category to be less discriminatory. For the sake of solving these problems, this paper proposes a method of mapping each modal data to the optimal length of hash coding space, respectively, and then the hash encoding of each modal data is solved by the discrete cross-modal hash algorithm of two value constraints. Finally, the similarity of multimodal data is compared in the potential space. The experimental results of the cross-model retrieval based on variable hash coding are better than that of the relative comparison methods in the WIKI data set, NUS-WIDE data set, as well as MIRFlickr data set, and the method we proposed is proved to be feasible and effective.
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Chávez, Ricardo Omar, Hugo Jair Escalante, Manuel Montes-y-Gómez y Luis Enrique Sucar. "Multimodal Markov Random Field for Image Reranking Based on Relevance Feedback". ISRN Machine Vision 2013 (11 de febrero de 2013): 1–16. http://dx.doi.org/10.1155/2013/428746.

Texto completo
Resumen
This paper introduces a multimodal approach for reranking of image retrieval results based on relevance feedback. We consider the problem of reordering the ranked list of images returned by an image retrieval system, in such a way that relevant images to a query are moved to the first positions of the list. We propose a Markov random field (MRF) model that aims at classifying the images in the initial retrieval-result list as relevant or irrelevant; the output of the MRF is used to generate a new list of ranked images. The MRF takes into account (1) the rank information provided by the initial retrieval system, (2) similarities among images in the list, and (3) relevance feedback information. Hence, the problem of image reranking is reduced to that of minimizing an energy function that represents a trade-off between image relevance and interimage similarity. The proposed MRF is a multimodal as it can take advantage of both visual and textual information by which images are described with. We report experimental results in the IAPR TC12 collection using visual and textual features to represent images. Experimental results show that our method is able to improve the ranking provided by the base retrieval system. Also, the multimodal MRF outperforms unimodal (i.e., either text-based or image-based) MRFs that we have developed in previous work. Furthermore, the proposed MRF outperforms baseline multimodal methods that combine information from unimodal MRFs.
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang y Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval". Proceedings of the AAAI Conference on Artificial Intelligence 34, n.º 07 (3 de abril de 2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Texto completo
Resumen
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Schöpper, Lars-Michael, Tarini Singh y Christian Frings. "The official soundtrack to “Five shades of grey”: Generalization in multimodal distractor-based retrieval". Attention, Perception, & Psychophysics 82, n.º 7 (12 de junio de 2020): 3479–89. http://dx.doi.org/10.3758/s13414-020-02057-4.

Texto completo
Resumen
Abstract When responding to two events in a sequence, the repetition or change of stimuli and the accompanying response can benefit or interfere with response execution: Full repetition leads to benefits in performance while partial repetition leads to costs. Additionally, even distractor stimuli can be integrated with a response, and can, upon repetition, lead to benefits or interference. Recently it has been suggested that not only identical, but also perceptually similar distractors retrieve a previous response (Singh et al., Attention, Perception, & Psychophysics, 78(8), 2307-2312, 2016): Participants discriminated four visual shapes appearing in five different shades of grey, the latter being irrelevant for task execution. Exact distractor repetitions yielded the strongest distractor-based retrieval effect, which decreased with increasing dissimilarity between shades of grey. In the current study, we expand these findings by conceptually replicating Singh et al. (2016) using multimodal stimuli. In Experiment 1 (N=31), participants discriminated four visual targets accompanied by five auditory distractors. In Experiment 2 (N=32), participants discriminated four auditory targets accompanied by five visual distractors. We replicated the generalization of distractor-based retrieval – that is, the distractor-based retrieval effect decreased with increasing distractor-dissimilarity. These results not only show that generalization in distractor-based retrieval occurs in multimodal feature processing, but also that these processes can occur for distractors perceived in a different modality to that of the target.
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Murrugarra-Llerena, Nils y Adriana Kovashka. "Image retrieval with mixed initiative and multimodal feedback". Computer Vision and Image Understanding 207 (junio de 2021): 103204. http://dx.doi.org/10.1016/j.cviu.2021.103204.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Ismail, Nor Azman y Ann O'Brien. "WEB-BASED PERSONAL DIGITAL PHOTO COLLECTIONS: MULTIMODAL RETRIEVAL". IIUM Engineering Journal 10, n.º 1 (29 de septiembre de 2010): 49–57. http://dx.doi.org/10.31436/iiumej.v10i1.104.

Texto completo
Resumen
When personal photo collections get large retrieval of specific photos or sets of photos becomes difficult mainly due to the fairly primitive means by which they are organised. Commercial photo handling systems help but often have only elementary searching features. In this paper, we describe an interactive web-based photo retrieval system that enables personal digital photo users to accomplish photo browsing by using multimodal interaction. This system not only enables users to use mouse click input modalities but also speech input modality to browse their personal digital photos in the World Wide Web (WWW) environment. The prototype system and it architecture utilise web technology which was built using web programming scripting (JavaScript, XHTML, ASP, XML based mark-up language) and image database in order to achieve its objective. All prototype programs and data files including the user’s photo repository, profiles, dialogues, grammars, prompt, and retrieval engine are stored and located in the web server. Our approach also consists of human-computer speech dialogue based on photo browsing of image content by four main categories (Who? What? When? and Where?). Our user study with 20 digital photo users showed that the participants reacted positively to their experience with the system interactions.
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

ZHANG, Jing. "Video retrieval model based on multimodal information fusion". Journal of Computer Applications 28, n.º 1 (10 de julio de 2008): 199–201. http://dx.doi.org/10.3724/sp.j.1087.2008.00199.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Cao, Wenming, Wenshuo Feng, Qiubin Lin, Guitao Cao y Zhihai He. "A Review of Hashing Methods for Multimodal Retrieval". IEEE Access 8 (2020): 15377–91. http://dx.doi.org/10.1109/access.2020.2968154.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Zhang, Yu, Ye Yuan, Yishu Wang y Guoren Wang. "A novel multimodal retrieval model based on ELM". Neurocomputing 277 (febrero de 2018): 65–77. http://dx.doi.org/10.1016/j.neucom.2017.03.095.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Mourão, André, Flávio Martins y João Magalhães. "Multimodal medical information retrieval with unsupervised rank fusion". Computerized Medical Imaging and Graphics 39 (enero de 2015): 35–45. http://dx.doi.org/10.1016/j.compmedimag.2014.05.006.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Revuelta-Martínez, Alejandro, Luis Rodríguez, Ismael García-Varea y Francisco Montero. "Multimodal interaction for information retrieval using natural language". Computer Standards & Interfaces 35, n.º 5 (septiembre de 2013): 428–41. http://dx.doi.org/10.1016/j.csi.2012.11.002.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Liu, Anan, Wenhui Li, Weizhi Nie y Yuting Su. "3D models retrieval algorithm based on multimodal data". Neurocomputing 259 (octubre de 2017): 176–82. http://dx.doi.org/10.1016/j.neucom.2016.06.087.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Daras, Petros y Apostolos Axenopoulos. "A 3D Shape Retrieval Framework Supporting Multimodal Queries". International Journal of Computer Vision 89, n.º 2-3 (30 de julio de 2009): 229–47. http://dx.doi.org/10.1007/s11263-009-0277-2.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Chen, Xu, Alfred O. Hero, III y Silvio Savarese. "Multimodal Video Indexing and Retrieval Using Directed Information". IEEE Transactions on Multimedia 14, n.º 1 (febrero de 2012): 3–16. http://dx.doi.org/10.1109/tmm.2011.2167223.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Pang, Lei, Shiai Zhu y Chong-Wah Ngo. "Deep Multimodal Learning for Affective Analysis and Retrieval". IEEE Transactions on Multimedia 17, n.º 11 (noviembre de 2015): 2008–20. http://dx.doi.org/10.1109/tmm.2015.2482228.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Sperandio, Ricardo C., Zenilton K. G. Patrocínio, Hugo B. de Paula y Silvio J. F. Guimarães. "An efficient access method for multimodal video retrieval". Multimedia Tools and Applications 74, n.º 4 (11 de abril de 2014): 1357–75. http://dx.doi.org/10.1007/s11042-014-1917-2.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Hubert, Gilles y Josiane Mothe. "An adaptable search engine for multimodal information retrieval". Journal of the American Society for Information Science and Technology 60, n.º 8 (agosto de 2009): 1625–34. http://dx.doi.org/10.1002/asi.21091.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

陈, 佳芸. "Multimodal Fashion Style Retrieval Based on Deep Learning". Computer Science and Application 13, n.º 03 (2023): 492–501. http://dx.doi.org/10.12677/csa.2023.133048.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

S. Gomathy, K. P. Deepa, T. Revathi y L. Maria Michael Visuwasam. "Genre Specific Classification for Information Search and Multimodal Semantic Indexing for Data Retrieval". SIJ Transactions on Computer Science Engineering & its Applications (CSEA) 01, n.º 01 (5 de abril de 2013): 10–15. http://dx.doi.org/10.9756/sijcsea/v1i1/01010159.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Qian, Shengsheng, Dizhan Xue, Huaiwen Zhang, Quan Fang y Changsheng Xu. "Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 3 (18 de mayo de 2021): 2440–48. http://dx.doi.org/10.1609/aaai.v35i3.16345.

Texto completo
Resumen
Cross-modal retrieval has become an active study field with the expanding scale of multimodal data. To date, most existing methods transform multimodal data into a common representation space where semantic similarities between items can be directly measured across different modalities. However, these methods typically suffer from following limitations: 1) They usually attempt to bridge the modality gap by designing losses in the common representation space which may not be sufficient to eliminate potential heterogeneity of different modalities in the common space. 2) They typically treat labels as independent individuals and ignore label relationships which are important for constructing semantic links between multimodal data. In this work, we propose a novel Dual Adversarial Graph Neural Networks (DAGNN) composed of the dual generative adversarial networks and the multi-hop graph neural networks, which learn modality-invariant and discriminative common representations for cross-modal retrieval. Firstly, we construct the dual generative adversarial networks to project multimodal data into a common representation space. Secondly, we leverage the multi-hop graph neural networks, in which a layer aggregation mechanism is proposed to exploit multi-hop propagation information, to capture the label correlation dependency and learn inter-dependent classifiers. Comprehensive experiments conducted on two cross-modal retrieval benchmark datasets, NUS-WIDE and MIRFlickr, indicate the superiority of DAGNN.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Ota, Kosuke, Keiichiro Shirai, Hidetoshi Miyao y Minoru Maruyama. "Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings". Journal of Advanced Computational Intelligence and Intelligent Informatics 26, n.º 6 (20 de noviembre de 2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.

Texto completo
Resumen
In this work, we study the application of multimodal analogical reasoning to image retrieval. Multimodal analogy questions are given in a form of tuples of words and images, e.g., “cat”:“dog”::[an image of a cat sitting on a bench]:?, to search for an image of a dog sitting on a bench. Retrieving desired images given these tuples can be seen as a task of finding images whose relation between the query image is close to that of query words. One way to achieve the task is building a common vector space that exhibits analogical regularities. To learn such an embedding, we propose a quadruple neural network called multimodal siamese network. The network consists of recurrent neural networks and convolutional neural networks based on the siamese architecture. We also introduce an effective procedure to generate analogy examples from an image-caption dataset for training of our network. In our experiments, we test our model on analogy-based image retrieval tasks. The results show that our method outperforms the previous work in qualitative evaluation.
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Li, Ruxuan, Jingyi Wang y Xuedong Tian. "A Multi-Modal Retrieval Model for Mathematical Expressions Based on ConvNeXt and Hesitant Fuzzy Set". Electronics 12, n.º 20 (20 de octubre de 2023): 4363. http://dx.doi.org/10.3390/electronics12204363.

Texto completo
Resumen
Mathematical expression retrieval is an essential component of mathematical information retrieval. Current mathematical expression retrieval research primarily targets single modalities, particularly text, which can lead to the loss of structural information. On the other hand, multimodal research has demonstrated promising outcomes across different domains, and mathematical expressions in image format are adept at preserving their structural characteristics. So we propose a multi-modal retrieval model for mathematical expressions based on ConvNeXt and HFS to address the limitations of single-modal retrieval. For the image modal, mathematical expression retrieval is based on the similarity of image features and symbol-level features of the expression, where image features of the expression image are extracted by ConvNeXt, while symbol-level features are obtained by the Symbol Level Features Extraction (SLFE) module. For the text modal, the Formula Description Structure (FDS) is employed to analyze expressions and extract their attributes. Additionally, the application of the Hesitant Fuzzy Set (HFS) theory facilitates the computation of hesitant fuzzy similarity between mathematical queries and candidate expressions. Finally, Reciprocal Rank Fusion (RRF) is employed to integrate rankings from image modal and text modal retrieval, yielding the ultimate retrieval list. The experiment was conducted on the publicly accessible ArXiv dataset (containing 592,345 mathematical expressions) and the NTCIR-mair-wikipedia-corpus (NTCIR) dataset.The MAP@10 values for the multimodal RRF fusion approach are recorded as 0.774. These substantiate the efficacy of the multi-modal mathematical expression retrieval approach based on ConvNeXt and HFS.
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

李, 劼博. "Video Speech Retrieval Model Based on Multimodal Feature Memory". Computer Science and Application 12, n.º 07 (2022): 1747–55. http://dx.doi.org/10.12677/csa.2022.127176.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

LIU, Zhi, Fangyuan ZHAO y Mengmeng ZHANG. "An Efficient Multimodal Aggregation Network for Video-Text Retrieval". IEICE Transactions on Information and Systems E105.D, n.º 10 (1 de octubre de 2022): 1825–28. http://dx.doi.org/10.1587/transinf.2022edl8018.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Bu, Shuhui, Shaoguang Cheng, Zhenbao Liu y Junwei Han. "Multimodal Feature Fusion for 3D Shape Recognition and Retrieval". IEEE MultiMedia 21, n.º 4 (octubre de 2014): 38–46. http://dx.doi.org/10.1109/mmul.2014.52.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Tang, Jinhui y Zechao Li. "Weakly Supervised Multimodal Hashing for Scalable Social Image Retrieval". IEEE Transactions on Circuits and Systems for Video Technology 28, n.º 10 (octubre de 2018): 2730–41. http://dx.doi.org/10.1109/tcsvt.2017.2715227.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Figueroa, Cristhian, Hugo Ordoñez, Juan-Carlos Corrales, Carlos Cobos, Leandro Krug Wives y Enrique Herrera-Viedma. "Improving business process retrieval using categorization and multimodal search". Knowledge-Based Systems 110 (octubre de 2016): 49–59. http://dx.doi.org/10.1016/j.knosys.2016.07.014.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Datta, Deepanwita, Shubham Varma, Ravindranath Chowdary C. y Sanjay K. Singh. "Multimodal Retrieval using Mutual Information based Textual Query Reformulation". Expert Systems with Applications 68 (febrero de 2017): 81–92. http://dx.doi.org/10.1016/j.eswa.2016.09.039.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Escalante, Hugo Jair, Manuel Montes y Enrique Sucar. "Multimodal indexing based on semantic cohesion for image retrieval". Information Retrieval 15, n.º 1 (5 de junio de 2011): 1–32. http://dx.doi.org/10.1007/s10791-011-9170-z.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Markonis, Dimitrios, Roger Schaer y Henning Müller. "Evaluating multimodal relevance feedback techniques for medical image retrieval". Information Retrieval Journal 19, n.º 1-2 (1 de agosto de 2015): 100–112. http://dx.doi.org/10.1007/s10791-015-9260-4.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Imhof, Melanie y Martin Braschler. "A study of untrained models for multimodal information retrieval". Information Retrieval Journal 21, n.º 1 (3 de noviembre de 2017): 81–106. http://dx.doi.org/10.1007/s10791-017-9322-x.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Soni, Ankita y Richa Chouhan. "Multimodal Information Retrieval by using Visual and Textual Query". International Journal of Computer Applications 137, n.º 1 (17 de marzo de 2016): 6–10. http://dx.doi.org/10.5120/ijca2016908637.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía