Artículos de revistas sobre el tema "Scalability in Cross-Modal Retrieval"

Siga este enlace para ver otros tipos de publicaciones sobre el tema: Scalability in Cross-Modal Retrieval.

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Scalability in Cross-Modal Retrieval".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Hu, Peng, Hongyuan Zhu, Xi Peng y Jie Lin. "Semi-Supervised Multi-Modal Learning with Balanced Spectral Decomposition". Proceedings of the AAAI Conference on Artificial Intelligence 34, n.º 01 (3 de abril de 2020): 99–106. http://dx.doi.org/10.1609/aaai.v34i01.5339.

Texto completo
Resumen
Cross-modal retrieval aims to retrieve the relevant samples across different modalities, of which the key problem is how to model the correlations among different modalities while narrowing the large heterogeneous gap. In this paper, we propose a Semi-supervised Multimodal Learning Network method (SMLN) which correlates different modalities by capturing the intrinsic structure and discriminative correlation of the multimedia data. To be specific, the labeled and unlabeled data are used to construct a similarity matrix which integrates the cross-modal correlation, discrimination, and intra-modal graph information existing in the multimedia data. What is more important is that we propose a novel optimization approach to optimize our loss within a neural network which involves a spectral decomposition problem derived from a ratio trace criterion. Our optimization enjoys two advantages given below. On the one hand, the proposed approach is not limited to our loss, which could be applied to any case that is a neural network with the ratio trace criterion. On the other hand, the proposed optimization is different from existing ones which alternatively maximize the minor eigenvalues, thus overemphasizing the minor eigenvalues and ignore the dominant ones. In contrast, our method will exactly balance all eigenvalues, thus being more competitive to existing methods. Thanks to our loss and optimization strategy, our method could well preserve the discriminative and instinct information into the common space and embrace the scalability in handling large-scale multimedia data. To verify the effectiveness of the proposed method, extensive experiments are carried out on three widely-used multimodal datasets comparing with 13 state-of-the-art approaches.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Rasheed, Ali Salim, Davood Zabihzadeh y Sumia Abdulhussien Razooqi Al-Obaidi. "Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification". International Journal of Pattern Recognition and Artificial Intelligence 34, n.º 13 (26 de mayo de 2020): 2050034. http://dx.doi.org/10.1142/s0218001420500342.

Texto completo
Resumen
Metric learning algorithms aim to make the conceptually related data items closer and keep dissimilar ones at a distance. The most common approach for metric learning on the Mahalanobis method. Despite its success, this method is limited to find a linear projection and also suffer from scalability respecting both the dimensionality and the size of input data. To address these problems, this paper presents a new scalable metric learning algorithm for multi-modal data. Our method learns an optimal metric for any feature set of the multi-modal data in an online fashion. We also combine the learned metrics with a novel Passive/Aggressive (PA)-based algorithm which results in a higher convergence rate compared to the state-of-the-art methods. To address scalability with respect to dimensionality, Dual Random Projection (DRP) is adopted in this paper. The present method is evaluated on some challenging machine vision datasets for image classification and Content-Based Information Retrieval (CBIR) tasks. The experimental results confirm that the proposed method significantly surpasses other state-of-the-art metric learning methods in most of these datasets in terms of both accuracy and efficiency.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Zalkow, Frank y Meinard Müller. "Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music". Applied Sciences 10, n.º 1 (18 de diciembre de 2019): 19. http://dx.doi.org/10.3390/app10010019.

Texto completo
Resumen
Cross-version music retrieval aims at identifying all versions of a given piece of music using a short query audio fragment. One previous approach, which is particularly suited for Western classical music, is based on a nearest neighbor search using short sequences of chroma features, also referred to as audio shingles. From the viewpoint of efficiency, indexing and dimensionality reduction are important aspects. In this paper, we extend previous work by adapting two embedding techniques; one is based on classical principle component analysis, and the other is based on neural networks with triplet loss. Furthermore, we report on systematically conducted experiments with Western classical music recordings and discuss the trade-off between retrieval quality and embedding dimensionality. As one main result, we show that, using neural networks, one can reduce the audio shingles from 240 to fewer than 8 dimensions with only a moderate loss in retrieval accuracy. In addition, we present extended experiments with databases of different sizes and different query lengths to test the scalability and generalizability of the dimensionality reduction methods. We also provide a more detailed view into the retrieval problem by analyzing the distances that appear in the nearest neighbor search.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Huang, Xiaobing, Tian Zhao y Yu Cao. "PIR". International Journal of Multimedia Data Engineering and Management 5, n.º 3 (julio de 2014): 1–27. http://dx.doi.org/10.4018/ijmdem.2014070101.

Texto completo
Resumen
Multimedia Information Retrieval (MIR) is a problem domain that includes programming tasks such as salient feature extraction, machine learning, indexing, and retrieval. There are a variety of implementations and algorithms for these tasks in different languages and frameworks, which are difficult to compose and reuse due to the interface and language incompatibility. Due to this low reusability, researchers often have to implement their experiments from scratch and the resulting programs cannot be easily adapted to parallel and distributed executions, which is important for handling large data sets. In this paper, we present Pipeline Information Retrieval (PIR), a Domain Specific Language (DSL) for multi-modal feature manipulation. The goal of PIR is to unify the MIR programming tasks by hiding the programming details under a flexible layer of domain specific interface. PIR optimizes the MIR tasks by compiling the DSL programs into pipeline graphs, which can be executed using a variety of strategies (e.g. sequential, parallel, or distributed execution). The authors evaluated the performance of PIR applications on single machine with multiple cores, local cluster, and Amazon Elastic Compute Cloud (EC2) platform. The result shows that the PIR programs can greatly help MIR researchers and developers perform fast prototyping on single machine environment and achieve nice scalability on distributed platforms.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Zhang, Zhen, Xu Wu y Shuang Wei. "Cross-Domain Access Control Model in Industrial IoT Environment". Applied Sciences 13, n.º 8 (17 de abril de 2023): 5042. http://dx.doi.org/10.3390/app13085042.

Texto completo
Resumen
The Industrial Internet of Things (IIoT) accelerates smart manufacturing and boosts production efficiency through heterogeneous industrial equipment, intelligent sensors, and actuators. The Industrial Internet of Things is transforming from a traditional factory model to a new manufacturing mode, which allows cross-domain data-sharing among multiple system departments to enable smart manufacturing. A complete industrial product comes from the combined efforts of many different departments. Therefore, secure and reliable cross-domain access control has become the key to ensuring the security of cross-domain communication and resource-sharing. Traditional centralized access control schemes are prone to single-point failure problems. Recently, many researchers have integrated blockchain technology into access control models. However, most blockchain-based approaches use a single-chain structure, which has weak data management capability and scalability, while ensuring system security, and low access control efficiency, making it difficult to meet the needs of multi-domain cooperation in IIoT scenarios. Therefore, this paper proposes a decentralized cross-domain access model based on a master–slave chain with high scalability. Moreover, the model ensures the security and reliability of the master chain through a reputation-based node selection mechanism. Access control efficiency is improved by a grouping strategy retrieval method in the access control process. The experimental benchmarks of the proposed scheme use various performance metrics to highlight its applicability in the IIoT environment. The results show an 82% improvement in the throughput for the master–slave chain structure over the single-chain structure. There is also an improvement in the throughput and latency compared to the results of other studies.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

An, Duo, Alan Chiu, James A. Flanders, Wei Song, Dahua Shou, Yen-Chun Lu, Lars G. Grunnet et al. "Designing a retrievable and scalable cell encapsulation device for potential treatment of type 1 diabetes". Proceedings of the National Academy of Sciences 115, n.º 2 (26 de diciembre de 2017): E263—E272. http://dx.doi.org/10.1073/pnas.1708806115.

Texto completo
Resumen
Cell encapsulation has been shown to hold promise for effective, long-term treatment of type 1 diabetes (T1D). However, challenges remain for its clinical applications. For example, there is an unmet need for an encapsulation system that is capable of delivering sufficient cell mass while still allowing convenient retrieval or replacement. Here, we report a simple cell encapsulation design that is readily scalable and conveniently retrievable. The key to this design was to engineer a highly wettable, Ca2+-releasing nanoporous polymer thread that promoted uniform in situ cross-linking and strong adhesion of a thin layer of alginate hydrogel around the thread. The device provided immunoprotection of rat islets in immunocompetent C57BL/6 mice in a short-term (1-mo) study, similar to neat alginate fibers. However, the mechanical property of the device, critical for handling and retrieval, was much more robust than the neat alginate fibers due to the reinforcement of the central thread. It also had facile mass transfer due to the short diffusion distance. We demonstrated the therapeutic potential of the device through the correction of chemically induced diabetes in C57BL/6 mice using rat islets for 3 mo as well as in immunodeficient SCID-Beige mice using human islets for 4 mo. We further showed, as a proof of concept, the scalability and retrievability in dogs. After 1 mo of implantation in dogs, the device could be rapidly retrieved through a minimally invasive laparoscopic procedure. This encapsulation device may contribute to a cellular therapy for T1D because of its retrievability and scale-up potential.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Tamchyna, Aleš, Ondřej Dušek, Rudolf Rosa y Pavel Pecina. "MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service". Prague Bulletin of Mathematical Linguistics 100, n.º 1 (1 de octubre de 2013): 31–40. http://dx.doi.org/10.2478/pralin-2013-0009.

Texto completo
Resumen
Abstract We present a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre- and post-processing. It is currently used to provide MT between several languages for cross-lingual information retrieval in the EU FP7 Khresmoi project. The software consists of an application server and remote workers which handle text processing and communicate translation requests to MT systems. The communication between the application server and the workers is based on the XML-RPC protocol. We present the overall design of the software and test results which document speed and scalability of our solution. Our software is licensed under the Apache 2.0 licence and is available for download from the Lindat-Clarin repository and Github.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Zhang, Chengyuan, Jiayu Song, Xiaofeng Zhu, Lei Zhu y Shichao Zhang. "HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval". ACM Transactions on Multimedia Computing, Communications, and Applications 17, n.º 1s (20 de abril de 2021): 1–22. http://dx.doi.org/10.1145/3412847.

Texto completo
Resumen
The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Wu, Yiling, Shuhui Wang y Qingming Huang. "Multi-modal semantic autoencoder for cross-modal retrieval". Neurocomputing 331 (febrero de 2019): 165–75. http://dx.doi.org/10.1016/j.neucom.2018.11.042.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Devezas, José. "Graph-based entity-oriented search". ACM SIGIR Forum 55, n.º 1 (junio de 2021): 1–2. http://dx.doi.org/10.1145/3476415.3476430.

Texto completo
Resumen
Entity-oriented search has revolutionized search engines. In the era of Google Knowledge Graph and Microsoft Satori, users demand an effortless process of search. Whether they express an information need through a keyword query, expecting documents and entities, or through a clicked entity, expecting related entities, there is an inherent need for the combination of corpora and knowledge bases to obtain an answer. Such integration frequently relies on independent signals extracted from inverted indexes, and from quad indexes indirectly accessed through queries to a triplestore. However, relying on two separate representation models inhibits the effective cross-referencing of information, discarding otherwise available relations that could lead to a better ranking. Moreover, different retrieval tasks often demand separate implementations, although the problem is, at its core, the same. With the goal of harnessing all available information to optimize retrieval, we explore joint representation models of documents and entities, while taking a step towards the definition of a more general retrieval approach. Specifically, we propose that graphs should be used to incorporate explicit and implicit information derived from the relations between text found in corpora and entities found in knowledge bases. We also take advantage of this framework to elaborate a general model for entity-oriented search, proposing a universal ranking function for the tasks of ad hoc document retrieval (leveraging entities), ad hoc entity retrieval, and entity list completion. At a conceptual stage, we begin by proposing the graph-of-entity, based on the relations between combinations of term and entity nodes. We introduce the entity weight as the corresponding ranking function, relying on the idea of seed nodes for representing the query, either directly through term nodes, or based on the expansion to adjacent entity nodes. The score is computed based on a series of geodesic distances to the remaining nodes, providing a ranking for the documents (or entities) in the graph. In order to improve on the low scalability of the graph-of-entity, we then redesigned this model in a way that reduced the number of edges in relation to the number of nodes, by relying on the hypergraph data structure. The resulting model, which we called hypergraph-of-entity, is the main contribution of this thesis. The obtained reduction was achieved by replacing binary edges with n -ary relations based on sets of nodes and entities (undirected document hyperedges), sets of entities (undirected hyperedges, either based on cooccurrence or a grouping by semantic subject), and pairs of a set of terms and a set of one entity (directed hyperedges, mapping text to an object). We introduce the random walk score as the corresponding ranking function, relying on the same idea of seed nodes, similar to the entity weight in the graph-of-entity. Scoring based on this function is highly reliant on the structure of the hypergraph, which we call representation-driven retrieval. As such, we explore several extensions of the hypergraph-of-entity, including relations of synonymy, or contextual similarity, as well as different weighting functions per node and hyperedge type. We also propose TF-bins as a discretization for representing term frequency in the hypergraph-of-entity. For the random walk score, we propose and explore several parameters, including length and repeats, with or without seed node expansion, direction, or weights, and with or without a certain degree of node and/or hyperedge fatigue, a concept that we also propose. For evaluation, we took advantage of TREC 2017 OpenSearch track, which relied on an online evaluation process based on the Living Labs API, and we also participated in TREC 2018 Common Core track, which was based on the newly introduced TREC Washington Post Corpus. Our main experiments were supported on the INEX 2009 Wikipedia collection, which proved to be a fundamental test collection for assessing retrieval effectiveness across multiple tasks. At first, our experiments solely focused on ad hoc document retrieval, ensuring that the model performed adequately for a classical task. We then expanded the work to cover all three entity-oriented search tasks. Results supported the viability of a general retrieval model, opening novel challenges in information retrieval, and proposing a new path towards generality in this area.
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Yang, Xianben y Wei Zhang. "Graph Convolutional Networks for Cross-Modal Information Retrieval". Wireless Communications and Mobile Computing 2022 (6 de enero de 2022): 1–8. http://dx.doi.org/10.1155/2022/6133142.

Texto completo
Resumen
In recent years, due to the wide application of deep learning and more modal research, the corresponding image retrieval system has gradually extended from traditional text retrieval to visual retrieval combined with images and has become the field of computer vision and natural language understanding and one of the important cross-research hotspots. This paper focuses on the research of graph convolutional networks for cross-modal information retrieval and has a general understanding of cross-modal information retrieval and the related theories of convolutional networks on the basis of literature data. Modal information retrieval is designed to combine high-level semantics with low-level visual capabilities in cross-modal information retrieval to improve the accuracy of information retrieval and then use experiments to verify the designed network model, and the result is that the model designed in this paper is more accurate than the traditional retrieval model, which is up to 90%.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Zhong, Fangming, Guangze Wang, Zhikui Chen, Feng Xia y Geyong Min. "Cross-Modal Retrieval for CPSS Data". IEEE Access 8 (2020): 16689–701. http://dx.doi.org/10.1109/access.2020.2967594.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Dutta, Titir y Soma Biswas. "Generalized Zero-Shot Cross-Modal Retrieval". IEEE Transactions on Image Processing 28, n.º 12 (diciembre de 2019): 5953–62. http://dx.doi.org/10.1109/tip.2019.2923287.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Feng, Fangxiang, Xiaojie Wang, Ruifan Li y Ibrar Ahmad. "Correspondence Autoencoders for Cross-Modal Retrieval". ACM Transactions on Multimedia Computing, Communications, and Applications 12, n.º 1s (21 de octubre de 2015): 1–22. http://dx.doi.org/10.1145/2808205.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Liu, Zhuokun, Huaping Liu, Wenmei Huang, Bowen Wang y Fuchun Sun. "Audiovisual cross-modal material surface retrieval". Neural Computing and Applications 32, n.º 18 (27 de septiembre de 2019): 14301–9. http://dx.doi.org/10.1007/s00521-019-04476-3.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Yu, Zheng y Wenmin Wang. "Learning DALTS for cross‐modal retrieval". CAAI Transactions on Intelligence Technology 4, n.º 1 (18 de febrero de 2019): 9–16. http://dx.doi.org/10.1049/trit.2018.1051.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Liu, Huan, Jiang Xiong, Nian Zhang, Fuming Liu y Xitao Zou. "Quadruplet-Based Deep Cross-Modal Hashing". Computational Intelligence and Neuroscience 2021 (2 de julio de 2021): 1–10. http://dx.doi.org/10.1155/2021/9968716.

Texto completo
Resumen
Recently, benefitting from the storage and retrieval efficiency of hashing and the powerful discriminative feature extraction capability of deep neural networks, deep cross-modal hashing retrieval has drawn more and more attention. To preserve the semantic similarities of cross-modal instances during the hash mapping procedure, most existing deep cross-modal hashing methods usually learn deep hashing networks with a pairwise loss or a triplet loss. However, these methods may not fully explore the similarity relation across modalities. To solve this problem, in this paper, we introduce a quadruplet loss into deep cross-modal hashing and propose a quadruplet-based deep cross-modal hashing (termed QDCMH) method. Extensive experiments on two benchmark cross-modal retrieval datasets show that our proposed method achieves state-of-the-art performance and demonstrate the efficiency of the quadruplet loss in cross-modal hashing.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Zou, Fuhao, Xingqiang Bai, Chaoyang Luan, Kai Li, Yunfei Wang y Hefei Ling. "Semi-supervised cross-modal learning for cross modal retrieval and image annotation". World Wide Web 22, n.º 2 (13 de julio de 2018): 825–41. http://dx.doi.org/10.1007/s11280-018-0581-2.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Yang, Xiaohan, Zhen Wang, Nannan Wu, Guokun Li, Chuang Feng y Pingping Liu. "Unsupervised Deep Relative Neighbor Relationship Preserving Cross-Modal Hashing". Mathematics 10, n.º 15 (28 de julio de 2022): 2644. http://dx.doi.org/10.3390/math10152644.

Texto completo
Resumen
The image-text cross-modal retrieval task, which aims to retrieve the relevant image from text and vice versa, is now attracting widespread attention. To quickly respond to the large-scale task, we propose an Unsupervised Deep Relative Neighbor Relationship Preserving Cross-Modal Hashing (DRNPH) to achieve cross-modal retrieval in the common Hamming space, which has the advantages of storage and efficiency. To fulfill the nearest neighbor search in the Hamming space, we demand to reconstruct both the original intra- and inter-modal neighbor matrix according to the binary feature vectors. Thus, we can compute the neighbor relationship among different modal samples directly based on the Hamming distances. Furthermore, the cross-modal pair-wise similarity preserving constraint requires the similar sample pair have an identical Hamming distance to the anchor. Therefore, the similar sample pairs own the same binary code, and they have minimal Hamming distances. Unfortunately, the pair-wise similarity preserving constraint may lead to an imbalanced code problem. Therefore, we propose the cross-modal triplet relative similarity preserving constraint, which demands the Hamming distances of similar pairs should be less than those of dissimilar pairs to distinguish the samples’ ranking orders in the retrieval results. Moreover, a large similarity marginal can boost the algorithm’s noise robustness. We conduct the cross-modal retrieval comparative experiments and ablation study on two public datasets, MIRFlickr and NUS-WIDE, respectively. The experimental results show that DRNPH outperforms the state-of-the-art approaches in various image-text retrieval scenarios, and all three proposed constraints are necessary and effective for boosting cross-modal retrieval performance.
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Huang, Xin, Yuxin Peng y Mingkuan Yuan. "MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval". IEEE Transactions on Cybernetics 50, n.º 3 (marzo de 2020): 1047–59. http://dx.doi.org/10.1109/tcyb.2018.2879846.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Wang, Suping, Ligu Zhu, Lei Shi, Hao Mo y Songfu Tan. "A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective". Applied Sciences 13, n.º 7 (4 de abril de 2023): 4571. http://dx.doi.org/10.3390/app13074571.

Texto completo
Resumen
Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Zhong, Fangming, Zhikui Chen y Geyong Min. "Deep Discrete Cross-Modal Hashing for Cross-Media Retrieval". Pattern Recognition 83 (noviembre de 2018): 64–77. http://dx.doi.org/10.1016/j.patcog.2018.05.018.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

刘, 志虎. "Label Consistency Hashing for Cross-Modal Retrieval". Computer Science and Application 11, n.º 04 (2021): 1104–12. http://dx.doi.org/10.12677/csa.2021.114114.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Gou, Tingting, Libo Liu, Qian Liu y Zhen Deng. "A New Approach to Cross-Modal Retrieval". Journal of Physics: Conference Series 1288 (agosto de 2019): 012044. http://dx.doi.org/10.1088/1742-6596/1288/1/012044.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Hu, Peng, Dezhong Peng, Xu Wang y Yong Xiang. "Multimodal adversarial network for cross-modal retrieval". Knowledge-Based Systems 180 (septiembre de 2019): 38–50. http://dx.doi.org/10.1016/j.knosys.2019.05.017.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Cao, Wenming, Qiubin Lin, Zhihai He y Zhiquan He. "Hybrid representation learning for cross-modal retrieval". Neurocomputing 345 (junio de 2019): 45–57. http://dx.doi.org/10.1016/j.neucom.2018.10.082.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Yao, Tao, Xiangwei Kong, Haiyan Fu y Qi Tian. "Semantic consistency hashing for cross-modal retrieval". Neurocomputing 193 (junio de 2016): 250–59. http://dx.doi.org/10.1016/j.neucom.2016.02.016.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Xie, Liang, Peng Pan y Yansheng Lu. "Analyzing semantic correlation for cross-modal retrieval". Multimedia Systems 21, n.º 6 (25 de junio de 2014): 525–39. http://dx.doi.org/10.1007/s00530-014-0397-6.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Song, Ge, Dong Wang y Xiaoyang Tan. "Deep Memory Network for Cross-Modal Retrieval". IEEE Transactions on Multimedia 21, n.º 5 (mayo de 2019): 1261–75. http://dx.doi.org/10.1109/tmm.2018.2877122.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Yu, Mengyang, Li Liu y Ling Shao. "Binary Set Embedding for Cross-Modal Retrieval". IEEE Transactions on Neural Networks and Learning Systems 28, n.º 12 (diciembre de 2017): 2899–910. http://dx.doi.org/10.1109/tnnls.2016.2609463.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Li, Zhuoyi, Huibin Lu, Hao Fu, Zhongrui Wang y Guanghua Gu. "Adaptive Adversarial Learning based cross-modal retrieval". Engineering Applications of Artificial Intelligence 123 (agosto de 2023): 106439. http://dx.doi.org/10.1016/j.engappai.2023.106439.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Li, Guokun, Zhen Wang, Shibo Xu, Chuang Feng, Xiaohan Yang, Nannan Wu y Fuzhen Sun. "Deep Adversarial Learning Triplet Similarity Preserving Cross-Modal Retrieval Algorithm". Mathematics 10, n.º 15 (25 de julio de 2022): 2585. http://dx.doi.org/10.3390/math10152585.

Texto completo
Resumen
The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. However, inconsistent distribution and diverse representation make it hard to directly measure the similarity relationship between different modal samples, which causes a heterogeneity gap. To bridge the above-mentioned gap, we propose the deep adversarial learning triplet similarity preserving cross-modal retrieval algorithm to map different modal samples into the common space, allowing their feature representation to preserve both the original inter- and intra-modal semantic similarity relationship. During the training process, we employ GANs, which has advantages in modeling data distribution and learning discriminative representation, in order to learn different modal features. As a result, it can align different modal feature distributions. Generally, many cross-modal retrieval algorithms only preserve the inter-modal similarity relationship, which makes the nearest neighbor retrieval results vulnerable to noise. In contrast, we establish the triplet similarity preserving function to simultaneously preserve the inter- and intra-modal similarity relationship in the common space and in each modal space, respectively. Thus, the proposed algorithm has a strong robustness to noise. In each modal space, to ensure that the generated features have the same semantic information as the sample labels, we establish a linear classifier and require that the generated features’ classification results be consistent with the sample labels. We conducted cross-modal retrieval comparative experiments on two widely used benchmark datasets—Pascal Sentence and Wikipedia. For the image to text task, our proposed method improved the mAP values by 1% and 0.7% on the Pascal sentence and Wikipedia datasets, respectively. Correspondingly, the proposed method separately improved the mAP values of the text to image performance by 0.6% and 0.8% on the Pascal sentence and Wikipedia datasets, respectively. The experimental results show that the proposed algorithm is better than the other state-of-the-art methods.
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Bhatt, Nikita y Amit Ganatra. "Improvement of deep cross-modal retrieval by generating real-valued representation". PeerJ Computer Science 7 (27 de abril de 2021): e491. http://dx.doi.org/10.7717/peerj-cs.491.

Texto completo
Resumen
The cross-modal retrieval (CMR) has attracted much attention in the research community due to flexible and comprehensive retrieval. The core challenge in CMR is the heterogeneity gap, which is generated due to different statistical properties of multi-modal data. The most common solution to bridge the heterogeneity gap is representation learning, which generates a common sub-space. In this work, we propose a framework called “Improvement of Deep Cross-Modal Retrieval (IDCMR)”, which generates real-valued representation. The IDCMR preserves both intra-modal and inter-modal similarity. The intra-modal similarity is preserved by selecting an appropriate training model for text and image modality. The inter-modal similarity is preserved by reducing modality-invariance loss. The mean average precision (mAP) is used as a performance measure in the CMR system. Extensive experiments are performed, and results show that IDCMR outperforms over state-of-the-art methods by a margin 4% and 2% relatively with mAP in the text to image and image to text retrieval tasks on MSCOCO and Xmedia dataset respectively.
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Liu, Li, Xiao Dong y Tianshi Wang. "Semi-Supervised Cross-Modal Retrieval Based on Discriminative Comapping". Complexity 2020 (18 de julio de 2020): 1–13. http://dx.doi.org/10.1155/2020/1462429.

Texto completo
Resumen
Most cross-modal retrieval methods based on subspace learning just focus on learning the projection matrices that map different modalities to a common subspace and pay less attention to the retrieval task specificity and class information. To address the two limitations and make full use of unlabelled data, we propose a novel semi-supervised method for cross-modal retrieval named modal-related retrieval based on discriminative comapping (MRRDC). The projection matrices are obtained to map multimodal data into a common subspace for different tasks. In the process of projection matrix learning, a linear discriminant constraint is introduced to preserve the original class information in different modal spaces. An iterative optimization algorithm based on label propagation is presented to solve the proposed joint learning formulations. The experimental results on several datasets demonstrate the superiority of our method compared with state-of-the-art subspace methods.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Zhang, Guihao y Jiangzhong Cao. "Feature Fusion Based on Transformer for Cross-modal Retrieval". Journal of Physics: Conference Series 2558, n.º 1 (1 de agosto de 2023): 012012. http://dx.doi.org/10.1088/1742-6596/2558/1/012012.

Texto completo
Resumen
Abstract With the popularity of the Internet and the rapid growth of multimodal data, multimodal retrieval has gradually become a hot area of research. As one of the important branches of multimodal retrieval, image-text retrieval aims to design a model to learn and align two modal data, image and text, in order to build a bridge of semantic association between the two heterogeneous data, so as to achieve unified alignment and retrieval. The current mainstream image-text cross-modal retrieval approaches have made good progress by designing a deep learning-based model to find potential associations between different modal data. In this paper, we design a transformer-based feature fusion network to fuse the information of two modalities in the feature extraction process, which can enrich the semantic connection between the modalities. Meanwhile, we conduct experiments on the benchmark dataset Flickr30k and get competitive results, where recall at 10 achieves 96.2% accuracy in image-to-text retrieval.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Xie, Yicai, Xianhua Zeng, Tinghua Wang y Yun Yi. "Online deep hashing for both uni-modal and cross-modal retrieval". Information Sciences 608 (agosto de 2022): 1480–502. http://dx.doi.org/10.1016/j.ins.2022.07.039.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Yang, Fan, Zheng Wang, Jing Xiao y Shin'ichi Satoh. "Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval". Proceedings of the AAAI Conference on Artificial Intelligence 34, n.º 07 (3 de abril de 2020): 12589–96. http://dx.doi.org/10.1609/aaai.v34i07.6949.

Texto completo
Resumen
Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal v.s. visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Guo, Jiaen, Haibin Wang, Bo Dan y Yu Lu. "Deep Supervised Cross-modal Hashing for Ship Image Retrieval". Journal of Physics: Conference Series 2320, n.º 1 (1 de agosto de 2022): 012023. http://dx.doi.org/10.1088/1742-6596/2320/1/012023.

Texto completo
Resumen
Abstract The retrieval of multimodal ship images obtained by remote sensing satellites is an important content of remote sensing data analysis, which is of great significance to improve the ability of marine monitoring. In this paper, We propose a novel cross-modal ship image retrieval method, called Deep Supervised Cross-modal Hashing(DSCMH). It consists of a feature learning part and a hash learning part used for feature extraction and hash code generation separately, both two parts have modality-invariant constraints to keep the cross-modal invariability, and the label information is also brought to supervise the above process. Furthermore, we design a class attention module based on the cross-modal class center to strengthen class discrimination. The experiment results show that the proposed method can effectively improve the cross-modal retrieval accuracy of ship images and is better than several state-of-the-art methods.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Zheng, Fuzhong, Weipeng Li, Xu Wang, Luyao Wang, Xiong Zhang y Haisu Zhang. "A Cross-Attention Mechanism Based on Regional-Level Semantic Features of Images for Cross-Modal Text-Image Retrieval in Remote Sensing". Applied Sciences 12, n.º 23 (29 de noviembre de 2022): 12221. http://dx.doi.org/10.3390/app122312221.

Texto completo
Resumen
With the rapid development of remote sensing (RS) observation technology over recent years, the high-level semantic association-based cross-modal retrieval of RS images has drawn some attention. However, few existing studies on cross-modal retrieval of RS images have addressed the issue of mutual interference between semantic features of images caused by “multi-scene semantics”. Therefore, we proposed a novel cross-attention (CA) model, called CABIR, based on regional-level semantic features of RS images for cross-modal text-image retrieval. This technique utilizes the CA mechanism to implement cross-modal information interaction and guides the network with textual semantics to allocate weights and filter redundant features for image regions, reducing the effect of irrelevant scene semantics on retrieval. Furthermore, we proposed BERT plus Bi-GRU, a new approach to generating statement-level textual features, and designed an effective temperature control function to steer the CA network toward smooth running. Our experiment suggested that CABIR not only outperforms other state-of-the-art cross-modal image retrieval methods but also demonstrates high generalization ability and stability, with an average recall rate of up to 18.12%, 48.30%, and 55.53% over the datasets RSICD, UCM, and Sydney, respectively. The model proposed in this paper will be able to provide a possible solution to the problem of mutual interference of RS images with “multi-scene semantics” due to complex terrain objects.
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Zheng, Qibin, Xiaoguang Ren, Yi Liu y Wei Qin. "Abstraction and Association: Cross-Modal Retrieval Based on Consistency between Semantic Structures". Mathematical Problems in Engineering 2020 (7 de mayo de 2020): 1–17. http://dx.doi.org/10.1155/2020/2503137.

Texto completo
Resumen
Cross-modal retrieval aims to find relevant data of different modalities, such as images and text. In order to bridge the modality gap, most existing methods require a lot of coupled sample pairs as training data. To reduce the demands for training data, we propose a cross-modal retrieval framework that utilizes both coupled and uncoupled samples. The framework consists of two parts: Abstraction that aims to provide high-level single-modal representations with uncoupled samples; then, Association links different modalities through a few coupled training samples. Moreover, under this framework, we implement a cross-modal retrieval method based on the consistency between the semantic structure of multiple modalities. First, both images and text are represented with the semantic structure-based representation, which represents each sample as its similarity from the reference points that are generated from single-modal clustering. Then, the reference points of different modalities are aligned through an active learning strategy. Finally, the cross-modal similarity can be measured with the consistency between the semantic structures. The experiment results demonstrate that given proper abstraction of single-modal data, the relationship between different modalities can be simplified, and even limited coupled cross-modal training data are sufficient for satisfactory retrieval accuracy.
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Geigle, Gregor, Jonas Pfeiffer, Nils Reimers, Ivan Vulić y Iryna Gurevych. "Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval". Transactions of the Association for Computational Linguistics 10 (2022): 503–21. http://dx.doi.org/10.1162/tacl_a_00473.

Texto completo
Resumen
Abstract Current state-of-the-art approaches to cross- modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. While offering unmatched retrieval performance, such models: 1) are typically pretrained from scratch and thus less scalable, 2) suffer from huge retrieval latency and inefficiency issues, which makes them impractical in realistic applications. To address these crucial gaps towards both improved and efficient cross- modal retrieval, we propose a novel fine-tuning framework that turns any pretrained text-image multi-modal model into an efficient retrieval model. The framework is based on a cooperative retrieve-and-rerank approach that combines: 1) twin networks (i.e., a bi-encoder) to separately encode all items of a corpus, enabling efficient initial retrieval, and 2) a cross-encoder component for a more nuanced (i.e., smarter) ranking of the retrieved small set of items. We also propose to jointly fine- tune the two components with shared weights, yielding a more parameter-efficient model. Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross- encoders.1
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Li, Chao, Cheng Deng, Lei Wang, De Xie y Xianglong Liu. "Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 de julio de 2019): 176–83. http://dx.doi.org/10.1609/aaai.v33i01.3301176.

Texto completo
Resumen
In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Cheng, Shuli, Liejun Wang y Anyu Du. "Deep Semantic-Preserving Reconstruction Hashing for Unsupervised Cross-Modal Retrieval". Entropy 22, n.º 11 (7 de noviembre de 2020): 1266. http://dx.doi.org/10.3390/e22111266.

Texto completo
Resumen
Deep hashing is the mainstream algorithm for large-scale cross-modal retrieval due to its high retrieval speed and low storage capacity, but the problem of reconstruction of modal semantic information is still very challenging. In order to further solve the problem of unsupervised cross-modal retrieval semantic reconstruction, we propose a novel deep semantic-preserving reconstruction hashing (DSPRH). The algorithm combines spatial and channel semantic information, and mines modal semantic information based on adaptive self-encoding and joint semantic reconstruction loss. The main contributions are as follows: (1) We introduce a new spatial pooling network module based on tensor regular-polymorphic decomposition theory to generate rank-1 tensor to capture high-order context semantics, which can assist the backbone network to capture important contextual modal semantic information. (2) Based on optimization perspective, we use global covariance pooling to capture channel semantic information and accelerate network convergence. In feature reconstruction layer, we use two bottlenecks auto-encoding to achieve visual-text modal interaction. (3) In metric learning, we design a new loss function to optimize model parameters, which can preserve the correlation between image modalities and text modalities. The DSPRH algorithm is tested on MIRFlickr-25K and NUS-WIDE. The experimental results show that DSPRH has achieved better performance on retrieval tasks.
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

He, Chao, Dalin Wang, Zefu Tan, Liming Xu y Nina Dai. "Cross-Modal Discrimination Hashing Retrieval Using Variable Length". Security and Communication Networks 2022 (9 de septiembre de 2022): 1–12. http://dx.doi.org/10.1155/2022/9638683.

Texto completo
Resumen
Fast cross-modal retrieval technology based on hash coding has become a hot topic for the rich multimodal data (text, image, audio, etc.), especially security and privacy challenges in the Internet of Things and mobile edge computing. However, most methods based on hash coding are only mapped to the common hash coding space, and it relaxes the two value constraints of hash coding. Therefore, the learning of the multimodal hash coding may not be sufficient and effective to express the original multimodal data and cause the hash encoding category to be less discriminatory. For the sake of solving these problems, this paper proposes a method of mapping each modal data to the optimal length of hash coding space, respectively, and then the hash encoding of each modal data is solved by the discrete cross-modal hash algorithm of two value constraints. Finally, the similarity of multimodal data is compared in the potential space. The experimental results of the cross-model retrieval based on variable hash coding are better than that of the relative comparison methods in the WIKI data set, NUS-WIDE data set, as well as MIRFlickr data set, and the method we proposed is proved to be feasible and effective.
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Alikhani, Malihe, Fangda Han, Hareesh Ravi, Mubbasir Kapadia, Vladimir Pavlovic y Matthew Stone. "Cross-Modal Coherence for Text-to-Image Retrieval". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 10 (28 de junio de 2022): 10427–35. http://dx.doi.org/10.1609/aaai.v36i10.21285.

Texto completo
Resumen
Common image-text joint understanding techniques presume that images and the associated text can universally be characterized by a single implicit model. However, co-occurring images and text can be related in qualitatively different ways, and explicitly modeling it could improve the performance of current joint understanding models. In this paper, we train a Cross-Modal Coherence Model for text-to-image retrieval task. Our analysis shows that models trained with image–text coherence relations can retrieve images originally paired with target text more often than coherence-agnostic models. We also show via human evaluation that images retrieved by the proposed coherence-aware model are preferred over a coherence-agnostic baseline by a huge margin. Our findings provide insights into the ways that different modalities communicate and the role of coherence relations in capturing commonsense inferences in text and imagery.
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Fang, Xiaozhao, Zhihu Liu, Na Han, Lin Jiang y Shaohua Teng. "Discrete matrix factorization hashing for cross-modal retrieval". International Journal of Machine Learning and Cybernetics 12, n.º 10 (2 de agosto de 2021): 3023–36. http://dx.doi.org/10.1007/s13042-021-01395-5.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Tang, Jun, Ke Wang y Ling Shao. "Supervised Matrix Factorization Hashing for Cross-Modal Retrieval". IEEE Transactions on Image Processing 25, n.º 7 (julio de 2016): 3157–66. http://dx.doi.org/10.1109/tip.2016.2564638.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Mandal, Devraj, Kunal N. Chaudhury y Soma Biswas. "Generalized Semantic Preserving Hashing for Cross-Modal Retrieval". IEEE Transactions on Image Processing 28, n.º 1 (enero de 2019): 102–12. http://dx.doi.org/10.1109/tip.2018.2863040.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Yao, Tao, Zhiwang Zhang, Lianshan Yan, Jun Yue y Qi Tian. "Discrete Robust Supervised Hashing for Cross-Modal Retrieval". IEEE Access 7 (2019): 39806–14. http://dx.doi.org/10.1109/access.2019.2897249.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Li, Kai, Guo-Jun Qi, Jun Ye y Kien A. Hua. "Linear Subspace Ranking Hashing for Cross-Modal Retrieval". IEEE Transactions on Pattern Analysis and Machine Intelligence 39, n.º 9 (1 de septiembre de 2017): 1825–38. http://dx.doi.org/10.1109/tpami.2016.2610969.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía