Log in

Relevant bibliographies by topics / Scalability in Cross-Modal Retrieval / Journal articles

To see the other types of publications on this topic, follow the link: Scalability in Cross-Modal Retrieval.

Journal articles on the topic 'Scalability in Cross-Modal Retrieval'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Scalability in Cross-Modal Retrieval.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Hu, Peng, Hongyuan Zhu, Xi Peng, and Jie Lin. "Semi-Supervised Multi-Modal Learning with Balanced Spectral Decomposition." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 99–106. http://dx.doi.org/10.1609/aaai.v34i01.5339.

Full text

Abstract:

Cross-modal retrieval aims to retrieve the relevant samples across different modalities, of which the key problem is how to model the correlations among different modalities while narrowing the large heterogeneous gap. In this paper, we propose a Semi-supervised Multimodal Learning Network method (SMLN) which correlates different modalities by capturing the intrinsic structure and discriminative correlation of the multimedia data. To be specific, the labeled and unlabeled data are used to construct a similarity matrix which integrates the cross-modal correlation, discrimination, and intra-modal graph information existing in the multimedia data. What is more important is that we propose a novel optimization approach to optimize our loss within a neural network which involves a spectral decomposition problem derived from a ratio trace criterion. Our optimization enjoys two advantages given below. On the one hand, the proposed approach is not limited to our loss, which could be applied to any case that is a neural network with the ratio trace criterion. On the other hand, the proposed optimization is different from existing ones which alternatively maximize the minor eigenvalues, thus overemphasizing the minor eigenvalues and ignore the dominant ones. In contrast, our method will exactly balance all eigenvalues, thus being more competitive to existing methods. Thanks to our loss and optimization strategy, our method could well preserve the discriminative and instinct information into the common space and embrace the scalability in handling large-scale multimedia data. To verify the effectiveness of the proposed method, extensive experiments are carried out on three widely-used multimodal datasets comparing with 13 state-of-the-art approaches.

APA, Harvard, Vancouver, ISO, and other styles

2

Rasheed, Ali Salim, Davood Zabihzadeh, and Sumia Abdulhussien Razooqi Al-Obaidi. "Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 13 (May 26, 2020): 2050034. http://dx.doi.org/10.1142/s0218001420500342.

Full text

Abstract:

Metric learning algorithms aim to make the conceptually related data items closer and keep dissimilar ones at a distance. The most common approach for metric learning on the Mahalanobis method. Despite its success, this method is limited to find a linear projection and also suffer from scalability respecting both the dimensionality and the size of input data. To address these problems, this paper presents a new scalable metric learning algorithm for multi-modal data. Our method learns an optimal metric for any feature set of the multi-modal data in an online fashion. We also combine the learned metrics with a novel Passive/Aggressive (PA)-based algorithm which results in a higher convergence rate compared to the state-of-the-art methods. To address scalability with respect to dimensionality, Dual Random Projection (DRP) is adopted in this paper. The present method is evaluated on some challenging machine vision datasets for image classification and Content-Based Information Retrieval (CBIR) tasks. The experimental results confirm that the proposed method significantly surpasses other state-of-the-art metric learning methods in most of these datasets in terms of both accuracy and efficiency.

APA, Harvard, Vancouver, ISO, and other styles

3

Zalkow, Frank, and Meinard Müller. "Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music." Applied Sciences 10, no. 1 (December 18, 2019): 19. http://dx.doi.org/10.3390/app10010019.

Full text

Abstract:

Cross-version music retrieval aims at identifying all versions of a given piece of music using a short query audio fragment. One previous approach, which is particularly suited for Western classical music, is based on a nearest neighbor search using short sequences of chroma features, also referred to as audio shingles. From the viewpoint of efficiency, indexing and dimensionality reduction are important aspects. In this paper, we extend previous work by adapting two embedding techniques; one is based on classical principle component analysis, and the other is based on neural networks with triplet loss. Furthermore, we report on systematically conducted experiments with Western classical music recordings and discuss the trade-off between retrieval quality and embedding dimensionality. As one main result, we show that, using neural networks, one can reduce the audio shingles from 240 to fewer than 8 dimensions with only a moderate loss in retrieval accuracy. In addition, we present extended experiments with databases of different sizes and different query lengths to test the scalability and generalizability of the dimensionality reduction methods. We also provide a more detailed view into the retrieval problem by analyzing the distances that appear in the nearest neighbor search.

APA, Harvard, Vancouver, ISO, and other styles

4

Huang, Xiaobing, Tian Zhao, and Yu Cao. "PIR." International Journal of Multimedia Data Engineering and Management 5, no. 3 (July 2014): 1–27. http://dx.doi.org/10.4018/ijmdem.2014070101.

Full text

Abstract:

Multimedia Information Retrieval (MIR) is a problem domain that includes programming tasks such as salient feature extraction, machine learning, indexing, and retrieval. There are a variety of implementations and algorithms for these tasks in different languages and frameworks, which are difficult to compose and reuse due to the interface and language incompatibility. Due to this low reusability, researchers often have to implement their experiments from scratch and the resulting programs cannot be easily adapted to parallel and distributed executions, which is important for handling large data sets. In this paper, we present Pipeline Information Retrieval (PIR), a Domain Specific Language (DSL) for multi-modal feature manipulation. The goal of PIR is to unify the MIR programming tasks by hiding the programming details under a flexible layer of domain specific interface. PIR optimizes the MIR tasks by compiling the DSL programs into pipeline graphs, which can be executed using a variety of strategies (e.g. sequential, parallel, or distributed execution). The authors evaluated the performance of PIR applications on single machine with multiple cores, local cluster, and Amazon Elastic Compute Cloud (EC2) platform. The result shows that the PIR programs can greatly help MIR researchers and developers perform fast prototyping on single machine environment and achieve nice scalability on distributed platforms.

APA, Harvard, Vancouver, ISO, and other styles

5

Zhang, Zhen, Xu Wu, and Shuang Wei. "Cross-Domain Access Control Model in Industrial IoT Environment." Applied Sciences 13, no. 8 (April 17, 2023): 5042. http://dx.doi.org/10.3390/app13085042.

Full text

Abstract:

The Industrial Internet of Things (IIoT) accelerates smart manufacturing and boosts production efficiency through heterogeneous industrial equipment, intelligent sensors, and actuators. The Industrial Internet of Things is transforming from a traditional factory model to a new manufacturing mode, which allows cross-domain data-sharing among multiple system departments to enable smart manufacturing. A complete industrial product comes from the combined efforts of many different departments. Therefore, secure and reliable cross-domain access control has become the key to ensuring the security of cross-domain communication and resource-sharing. Traditional centralized access control schemes are prone to single-point failure problems. Recently, many researchers have integrated blockchain technology into access control models. However, most blockchain-based approaches use a single-chain structure, which has weak data management capability and scalability, while ensuring system security, and low access control efficiency, making it difficult to meet the needs of multi-domain cooperation in IIoT scenarios. Therefore, this paper proposes a decentralized cross-domain access model based on a master–slave chain with high scalability. Moreover, the model ensures the security and reliability of the master chain through a reputation-based node selection mechanism. Access control efficiency is improved by a grouping strategy retrieval method in the access control process. The experimental benchmarks of the proposed scheme use various performance metrics to highlight its applicability in the IIoT environment. The results show an 82% improvement in the throughput for the master–slave chain structure over the single-chain structure. There is also an improvement in the throughput and latency compared to the results of other studies.

APA, Harvard, Vancouver, ISO, and other styles

6

An, Duo, Alan Chiu, James A. Flanders, Wei Song, Dahua Shou, Yen-Chun Lu, Lars G. Grunnet, et al. "Designing a retrievable and scalable cell encapsulation device for potential treatment of type 1 diabetes." Proceedings of the National Academy of Sciences 115, no. 2 (December 26, 2017): E263—E272. http://dx.doi.org/10.1073/pnas.1708806115.

Full text

Abstract:

Cell encapsulation has been shown to hold promise for effective, long-term treatment of type 1 diabetes (T1D). However, challenges remain for its clinical applications. For example, there is an unmet need for an encapsulation system that is capable of delivering sufficient cell mass while still allowing convenient retrieval or replacement. Here, we report a simple cell encapsulation design that is readily scalable and conveniently retrievable. The key to this design was to engineer a highly wettable, Ca2+-releasing nanoporous polymer thread that promoted uniform in situ cross-linking and strong adhesion of a thin layer of alginate hydrogel around the thread. The device provided immunoprotection of rat islets in immunocompetent C57BL/6 mice in a short-term (1-mo) study, similar to neat alginate fibers. However, the mechanical property of the device, critical for handling and retrieval, was much more robust than the neat alginate fibers due to the reinforcement of the central thread. It also had facile mass transfer due to the short diffusion distance. We demonstrated the therapeutic potential of the device through the correction of chemically induced diabetes in C57BL/6 mice using rat islets for 3 mo as well as in immunodeficient SCID-Beige mice using human islets for 4 mo. We further showed, as a proof of concept, the scalability and retrievability in dogs. After 1 mo of implantation in dogs, the device could be rapidly retrieved through a minimally invasive laparoscopic procedure. This encapsulation device may contribute to a cellular therapy for T1D because of its retrievability and scale-up potential.

APA, Harvard, Vancouver, ISO, and other styles

7

Tamchyna, Aleš, Ondřej Dušek, Rudolf Rosa, and Pavel Pecina. "MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service." Prague Bulletin of Mathematical Linguistics 100, no. 1 (October 1, 2013): 31–40. http://dx.doi.org/10.2478/pralin-2013-0009.

Full text

Abstract:

Abstract We present a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre- and post-processing. It is currently used to provide MT between several languages for cross-lingual information retrieval in the EU FP7 Khresmoi project. The software consists of an application server and remote workers which handle text processing and communicate translation requests to MT systems. The communication between the application server and the workers is based on the XML-RPC protocol. We present the overall design of the software and test results which document speed and scalability of our solution. Our software is licensed under the Apache 2.0 licence and is available for download from the Lindat-Clarin repository and Github.

APA, Harvard, Vancouver, ISO, and other styles

8

Zhang, Chengyuan, Jiayu Song, Xiaofeng Zhu, Lei Zhu, and Shichao Zhang. "HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval." ACM Transactions on Multimedia Computing, Communications, and Applications 17, no. 1s (April 20, 2021): 1–22. http://dx.doi.org/10.1145/3412847.

Full text

Abstract:

The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.

APA, Harvard, Vancouver, ISO, and other styles

9

Wu, Yiling, Shuhui Wang, and Qingming Huang. "Multi-modal semantic autoencoder for cross-modal retrieval." Neurocomputing 331 (February 2019): 165–75. http://dx.doi.org/10.1016/j.neucom.2018.11.042.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Devezas, José. "Graph-based entity-oriented search." ACM SIGIR Forum 55, no. 1 (June 2021): 1–2. http://dx.doi.org/10.1145/3476415.3476430.

Full text

Abstract:

Entity-oriented search has revolutionized search engines. In the era of Google Knowledge Graph and Microsoft Satori, users demand an effortless process of search. Whether they express an information need through a keyword query, expecting documents and entities, or through a clicked entity, expecting related entities, there is an inherent need for the combination of corpora and knowledge bases to obtain an answer. Such integration frequently relies on independent signals extracted from inverted indexes, and from quad indexes indirectly accessed through queries to a triplestore. However, relying on two separate representation models inhibits the effective cross-referencing of information, discarding otherwise available relations that could lead to a better ranking. Moreover, different retrieval tasks often demand separate implementations, although the problem is, at its core, the same. With the goal of harnessing all available information to optimize retrieval, we explore joint representation models of documents and entities, while taking a step towards the definition of a more general retrieval approach. Specifically, we propose that graphs should be used to incorporate explicit and implicit information derived from the relations between text found in corpora and entities found in knowledge bases. We also take advantage of this framework to elaborate a general model for entity-oriented search, proposing a universal ranking function for the tasks of ad hoc document retrieval (leveraging entities), ad hoc entity retrieval, and entity list completion. At a conceptual stage, we begin by proposing the graph-of-entity, based on the relations between combinations of term and entity nodes. We introduce the entity weight as the corresponding ranking function, relying on the idea of seed nodes for representing the query, either directly through term nodes, or based on the expansion to adjacent entity nodes. The score is computed based on a series of geodesic distances to the remaining nodes, providing a ranking for the documents (or entities) in the graph. In order to improve on the low scalability of the graph-of-entity, we then redesigned this model in a way that reduced the number of edges in relation to the number of nodes, by relying on the hypergraph data structure. The resulting model, which we called hypergraph-of-entity, is the main contribution of this thesis. The obtained reduction was achieved by replacing binary edges with n -ary relations based on sets of nodes and entities (undirected document hyperedges), sets of entities (undirected hyperedges, either based on cooccurrence or a grouping by semantic subject), and pairs of a set of terms and a set of one entity (directed hyperedges, mapping text to an object). We introduce the random walk score as the corresponding ranking function, relying on the same idea of seed nodes, similar to the entity weight in the graph-of-entity. Scoring based on this function is highly reliant on the structure of the hypergraph, which we call representation-driven retrieval. As such, we explore several extensions of the hypergraph-of-entity, including relations of synonymy, or contextual similarity, as well as different weighting functions per node and hyperedge type. We also propose TF-bins as a discretization for representing term frequency in the hypergraph-of-entity. For the random walk score, we propose and explore several parameters, including length and repeats, with or without seed node expansion, direction, or weights, and with or without a certain degree of node and/or hyperedge fatigue, a concept that we also propose. For evaluation, we took advantage of TREC 2017 OpenSearch track, which relied on an online evaluation process based on the Living Labs API, and we also participated in TREC 2018 Common Core track, which was based on the newly introduced TREC Washington Post Corpus. Our main experiments were supported on the INEX 2009 Wikipedia collection, which proved to be a fundamental test collection for assessing retrieval effectiveness across multiple tasks. At first, our experiments solely focused on ad hoc document retrieval, ensuring that the model performed adequately for a classical task. We then expanded the work to cover all three entity-oriented search tasks. Results supported the viability of a general retrieval model, opening novel challenges in information retrieval, and proposing a new path towards generality in this area.

APA, Harvard, Vancouver, ISO, and other styles

11

Yang, Xianben, and Wei Zhang. "Graph Convolutional Networks for Cross-Modal Information Retrieval." Wireless Communications and Mobile Computing 2022 (January 6, 2022): 1–8. http://dx.doi.org/10.1155/2022/6133142.

Full text

Abstract:

In recent years, due to the wide application of deep learning and more modal research, the corresponding image retrieval system has gradually extended from traditional text retrieval to visual retrieval combined with images and has become the field of computer vision and natural language understanding and one of the important cross-research hotspots. This paper focuses on the research of graph convolutional networks for cross-modal information retrieval and has a general understanding of cross-modal information retrieval and the related theories of convolutional networks on the basis of literature data. Modal information retrieval is designed to combine high-level semantics with low-level visual capabilities in cross-modal information retrieval to improve the accuracy of information retrieval and then use experiments to verify the designed network model, and the result is that the model designed in this paper is more accurate than the traditional retrieval model, which is up to 90%.

APA, Harvard, Vancouver, ISO, and other styles

12

Zhong, Fangming, Guangze Wang, Zhikui Chen, Feng Xia, and Geyong Min. "Cross-Modal Retrieval for CPSS Data." IEEE Access 8 (2020): 16689–701. http://dx.doi.org/10.1109/access.2020.2967594.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Dutta, Titir, and Soma Biswas. "Generalized Zero-Shot Cross-Modal Retrieval." IEEE Transactions on Image Processing 28, no. 12 (December 2019): 5953–62. http://dx.doi.org/10.1109/tip.2019.2923287.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Feng, Fangxiang, Xiaojie Wang, Ruifan Li, and Ibrar Ahmad. "Correspondence Autoencoders for Cross-Modal Retrieval." ACM Transactions on Multimedia Computing, Communications, and Applications 12, no. 1s (October 21, 2015): 1–22. http://dx.doi.org/10.1145/2808205.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Liu, Zhuokun, Huaping Liu, Wenmei Huang, Bowen Wang, and Fuchun Sun. "Audiovisual cross-modal material surface retrieval." Neural Computing and Applications 32, no. 18 (September 27, 2019): 14301–9. http://dx.doi.org/10.1007/s00521-019-04476-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Yu, Zheng, and Wenmin Wang. "Learning DALTS for cross‐modal retrieval." CAAI Transactions on Intelligence Technology 4, no. 1 (February 18, 2019): 9–16. http://dx.doi.org/10.1049/trit.2018.1051.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Liu, Huan, Jiang Xiong, Nian Zhang, Fuming Liu, and Xitao Zou. "Quadruplet-Based Deep Cross-Modal Hashing." Computational Intelligence and Neuroscience 2021 (July 2, 2021): 1–10. http://dx.doi.org/10.1155/2021/9968716.

Full text

Abstract:

Recently, benefitting from the storage and retrieval efficiency of hashing and the powerful discriminative feature extraction capability of deep neural networks, deep cross-modal hashing retrieval has drawn more and more attention. To preserve the semantic similarities of cross-modal instances during the hash mapping procedure, most existing deep cross-modal hashing methods usually learn deep hashing networks with a pairwise loss or a triplet loss. However, these methods may not fully explore the similarity relation across modalities. To solve this problem, in this paper, we introduce a quadruplet loss into deep cross-modal hashing and propose a quadruplet-based deep cross-modal hashing (termed QDCMH) method. Extensive experiments on two benchmark cross-modal retrieval datasets show that our proposed method achieves state-of-the-art performance and demonstrate the efficiency of the quadruplet loss in cross-modal hashing.

APA, Harvard, Vancouver, ISO, and other styles

18

Zou, Fuhao, Xingqiang Bai, Chaoyang Luan, Kai Li, Yunfei Wang, and Hefei Ling. "Semi-supervised cross-modal learning for cross modal retrieval and image annotation." World Wide Web 22, no. 2 (July 13, 2018): 825–41. http://dx.doi.org/10.1007/s11280-018-0581-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Yang, Xiaohan, Zhen Wang, Nannan Wu, Guokun Li, Chuang Feng, and Pingping Liu. "Unsupervised Deep Relative Neighbor Relationship Preserving Cross-Modal Hashing." Mathematics 10, no. 15 (July 28, 2022): 2644. http://dx.doi.org/10.3390/math10152644.

Full text

Abstract:

The image-text cross-modal retrieval task, which aims to retrieve the relevant image from text and vice versa, is now attracting widespread attention. To quickly respond to the large-scale task, we propose an Unsupervised Deep Relative Neighbor Relationship Preserving Cross-Modal Hashing (DRNPH) to achieve cross-modal retrieval in the common Hamming space, which has the advantages of storage and efficiency. To fulfill the nearest neighbor search in the Hamming space, we demand to reconstruct both the original intra- and inter-modal neighbor matrix according to the binary feature vectors. Thus, we can compute the neighbor relationship among different modal samples directly based on the Hamming distances. Furthermore, the cross-modal pair-wise similarity preserving constraint requires the similar sample pair have an identical Hamming distance to the anchor. Therefore, the similar sample pairs own the same binary code, and they have minimal Hamming distances. Unfortunately, the pair-wise similarity preserving constraint may lead to an imbalanced code problem. Therefore, we propose the cross-modal triplet relative similarity preserving constraint, which demands the Hamming distances of similar pairs should be less than those of dissimilar pairs to distinguish the samples’ ranking orders in the retrieval results. Moreover, a large similarity marginal can boost the algorithm’s noise robustness. We conduct the cross-modal retrieval comparative experiments and ablation study on two public datasets, MIRFlickr and NUS-WIDE, respectively. The experimental results show that DRNPH outperforms the state-of-the-art approaches in various image-text retrieval scenarios, and all three proposed constraints are necessary and effective for boosting cross-modal retrieval performance.

APA, Harvard, Vancouver, ISO, and other styles

20

Huang, Xin, Yuxin Peng, and Mingkuan Yuan. "MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval." IEEE Transactions on Cybernetics 50, no. 3 (March 2020): 1047–59. http://dx.doi.org/10.1109/tcyb.2018.2879846.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Wang, Suping, Ligu Zhu, Lei Shi, Hao Mo, and Songfu Tan. "A Survey of Full-Cycle Cross-Modal Retrieval: From a Representation Learning Perspective." Applied Sciences 13, no. 7 (April 4, 2023): 4571. http://dx.doi.org/10.3390/app13074571.

Full text

Abstract:

Cross-modal retrieval aims to elucidate information fusion, imitate human learning, and advance the field. Although previous reviews have primarily focused on binary and real-value coding methods, there is a scarcity of techniques grounded in deep representation learning. In this paper, we concentrated on harmonizing cross-modal representation learning and the full-cycle modeling of high-level semantic associations between vision and language, diverging from traditional statistical methods. We systematically categorized and summarized the challenges and open issues in implementing current technologies and investigated the pipeline of cross-modal retrieval, including pre-processing, feature engineering, pre-training tasks, encoding, cross-modal interaction, decoding, model optimization, and a unified architecture. Furthermore, we propose benchmark datasets and evaluation metrics to assist researchers in keeping pace with cross-modal retrieval advancements. By incorporating recent innovative works, we offer a perspective on potential advancements in cross-modal retrieval.

APA, Harvard, Vancouver, ISO, and other styles

22

Zhong, Fangming, Zhikui Chen, and Geyong Min. "Deep Discrete Cross-Modal Hashing for Cross-Media Retrieval." Pattern Recognition 83 (November 2018): 64–77. http://dx.doi.org/10.1016/j.patcog.2018.05.018.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

刘, 志虎. "Label Consistency Hashing for Cross-Modal Retrieval." Computer Science and Application 11, no. 04 (2021): 1104–12. http://dx.doi.org/10.12677/csa.2021.114114.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Gou, Tingting, Libo Liu, Qian Liu, and Zhen Deng. "A New Approach to Cross-Modal Retrieval." Journal of Physics: Conference Series 1288 (August 2019): 012044. http://dx.doi.org/10.1088/1742-6596/1288/1/012044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Hu, Peng, Dezhong Peng, Xu Wang, and Yong Xiang. "Multimodal adversarial network for cross-modal retrieval." Knowledge-Based Systems 180 (September 2019): 38–50. http://dx.doi.org/10.1016/j.knosys.2019.05.017.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Cao, Wenming, Qiubin Lin, Zhihai He, and Zhiquan He. "Hybrid representation learning for cross-modal retrieval." Neurocomputing 345 (June 2019): 45–57. http://dx.doi.org/10.1016/j.neucom.2018.10.082.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Yao, Tao, Xiangwei Kong, Haiyan Fu, and Qi Tian. "Semantic consistency hashing for cross-modal retrieval." Neurocomputing 193 (June 2016): 250–59. http://dx.doi.org/10.1016/j.neucom.2016.02.016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Xie, Liang, Peng Pan, and Yansheng Lu. "Analyzing semantic correlation for cross-modal retrieval." Multimedia Systems 21, no. 6 (June 25, 2014): 525–39. http://dx.doi.org/10.1007/s00530-014-0397-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Song, Ge, Dong Wang, and Xiaoyang Tan. "Deep Memory Network for Cross-Modal Retrieval." IEEE Transactions on Multimedia 21, no. 5 (May 2019): 1261–75. http://dx.doi.org/10.1109/tmm.2018.2877122.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Yu, Mengyang, Li Liu, and Ling Shao. "Binary Set Embedding for Cross-Modal Retrieval." IEEE Transactions on Neural Networks and Learning Systems 28, no. 12 (December 2017): 2899–910. http://dx.doi.org/10.1109/tnnls.2016.2609463.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Li, Zhuoyi, Huibin Lu, Hao Fu, Zhongrui Wang, and Guanghua Gu. "Adaptive Adversarial Learning based cross-modal retrieval." Engineering Applications of Artificial Intelligence 123 (August 2023): 106439. http://dx.doi.org/10.1016/j.engappai.2023.106439.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Li, Guokun, Zhen Wang, Shibo Xu, Chuang Feng, Xiaohan Yang, Nannan Wu, and Fuzhen Sun. "Deep Adversarial Learning Triplet Similarity Preserving Cross-Modal Retrieval Algorithm." Mathematics 10, no. 15 (July 25, 2022): 2585. http://dx.doi.org/10.3390/math10152585.

Full text

Abstract:

The cross-modal retrieval task can return different modal nearest neighbors, such as image or text. However, inconsistent distribution and diverse representation make it hard to directly measure the similarity relationship between different modal samples, which causes a heterogeneity gap. To bridge the above-mentioned gap, we propose the deep adversarial learning triplet similarity preserving cross-modal retrieval algorithm to map different modal samples into the common space, allowing their feature representation to preserve both the original inter- and intra-modal semantic similarity relationship. During the training process, we employ GANs, which has advantages in modeling data distribution and learning discriminative representation, in order to learn different modal features. As a result, it can align different modal feature distributions. Generally, many cross-modal retrieval algorithms only preserve the inter-modal similarity relationship, which makes the nearest neighbor retrieval results vulnerable to noise. In contrast, we establish the triplet similarity preserving function to simultaneously preserve the inter- and intra-modal similarity relationship in the common space and in each modal space, respectively. Thus, the proposed algorithm has a strong robustness to noise. In each modal space, to ensure that the generated features have the same semantic information as the sample labels, we establish a linear classifier and require that the generated features’ classification results be consistent with the sample labels. We conducted cross-modal retrieval comparative experiments on two widely used benchmark datasets—Pascal Sentence and Wikipedia. For the image to text task, our proposed method improved the mAP values by 1% and 0.7% on the Pascal sentence and Wikipedia datasets, respectively. Correspondingly, the proposed method separately improved the mAP values of the text to image performance by 0.6% and 0.8% on the Pascal sentence and Wikipedia datasets, respectively. The experimental results show that the proposed algorithm is better than the other state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

33

Bhatt, Nikita, and Amit Ganatra. "Improvement of deep cross-modal retrieval by generating real-valued representation." PeerJ Computer Science 7 (April 27, 2021): e491. http://dx.doi.org/10.7717/peerj-cs.491.

Full text

Abstract:

The cross-modal retrieval (CMR) has attracted much attention in the research community due to flexible and comprehensive retrieval. The core challenge in CMR is the heterogeneity gap, which is generated due to different statistical properties of multi-modal data. The most common solution to bridge the heterogeneity gap is representation learning, which generates a common sub-space. In this work, we propose a framework called “Improvement of Deep Cross-Modal Retrieval (IDCMR)”, which generates real-valued representation. The IDCMR preserves both intra-modal and inter-modal similarity. The intra-modal similarity is preserved by selecting an appropriate training model for text and image modality. The inter-modal similarity is preserved by reducing modality-invariance loss. The mean average precision (mAP) is used as a performance measure in the CMR system. Extensive experiments are performed, and results show that IDCMR outperforms over state-of-the-art methods by a margin 4% and 2% relatively with mAP in the text to image and image to text retrieval tasks on MSCOCO and Xmedia dataset respectively.

APA, Harvard, Vancouver, ISO, and other styles

34

Liu, Li, Xiao Dong, and Tianshi Wang. "Semi-Supervised Cross-Modal Retrieval Based on Discriminative Comapping." Complexity 2020 (July 18, 2020): 1–13. http://dx.doi.org/10.1155/2020/1462429.

Full text

Abstract:

Most cross-modal retrieval methods based on subspace learning just focus on learning the projection matrices that map different modalities to a common subspace and pay less attention to the retrieval task specificity and class information. To address the two limitations and make full use of unlabelled data, we propose a novel semi-supervised method for cross-modal retrieval named modal-related retrieval based on discriminative comapping (MRRDC). The projection matrices are obtained to map multimodal data into a common subspace for different tasks. In the process of projection matrix learning, a linear discriminant constraint is introduced to preserve the original class information in different modal spaces. An iterative optimization algorithm based on label propagation is presented to solve the proposed joint learning formulations. The experimental results on several datasets demonstrate the superiority of our method compared with state-of-the-art subspace methods.

APA, Harvard, Vancouver, ISO, and other styles

35

Zhang, Guihao, and Jiangzhong Cao. "Feature Fusion Based on Transformer for Cross-modal Retrieval." Journal of Physics: Conference Series 2558, no. 1 (August 1, 2023): 012012. http://dx.doi.org/10.1088/1742-6596/2558/1/012012.

Full text

Abstract:

Abstract With the popularity of the Internet and the rapid growth of multimodal data, multimodal retrieval has gradually become a hot area of research. As one of the important branches of multimodal retrieval, image-text retrieval aims to design a model to learn and align two modal data, image and text, in order to build a bridge of semantic association between the two heterogeneous data, so as to achieve unified alignment and retrieval. The current mainstream image-text cross-modal retrieval approaches have made good progress by designing a deep learning-based model to find potential associations between different modal data. In this paper, we design a transformer-based feature fusion network to fuse the information of two modalities in the feature extraction process, which can enrich the semantic connection between the modalities. Meanwhile, we conduct experiments on the benchmark dataset Flickr30k and get competitive results, where recall at 10 achieves 96.2% accuracy in image-to-text retrieval.

APA, Harvard, Vancouver, ISO, and other styles

36

Xie, Yicai, Xianhua Zeng, Tinghua Wang, and Yun Yi. "Online deep hashing for both uni-modal and cross-modal retrieval." Information Sciences 608 (August 2022): 1480–502. http://dx.doi.org/10.1016/j.ins.2022.07.039.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Yang, Fan, Zheng Wang, Jing Xiao, and Shin'ichi Satoh. "Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12589–96. http://dx.doi.org/10.1609/aaai.v34i07.6949.

Full text

Abstract:

Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal v.s. visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval

APA, Harvard, Vancouver, ISO, and other styles

38

Guo, Jiaen, Haibin Wang, Bo Dan, and Yu Lu. "Deep Supervised Cross-modal Hashing for Ship Image Retrieval." Journal of Physics: Conference Series 2320, no. 1 (August 1, 2022): 012023. http://dx.doi.org/10.1088/1742-6596/2320/1/012023.

Full text

Abstract:

Abstract The retrieval of multimodal ship images obtained by remote sensing satellites is an important content of remote sensing data analysis, which is of great significance to improve the ability of marine monitoring. In this paper, We propose a novel cross-modal ship image retrieval method, called Deep Supervised Cross-modal Hashing(DSCMH). It consists of a feature learning part and a hash learning part used for feature extraction and hash code generation separately, both two parts have modality-invariant constraints to keep the cross-modal invariability, and the label information is also brought to supervise the above process. Furthermore, we design a class attention module based on the cross-modal class center to strengthen class discrimination. The experiment results show that the proposed method can effectively improve the cross-modal retrieval accuracy of ship images and is better than several state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

39

Zheng, Fuzhong, Weipeng Li, Xu Wang, Luyao Wang, Xiong Zhang, and Haisu Zhang. "A Cross-Attention Mechanism Based on Regional-Level Semantic Features of Images for Cross-Modal Text-Image Retrieval in Remote Sensing." Applied Sciences 12, no. 23 (November 29, 2022): 12221. http://dx.doi.org/10.3390/app122312221.

Full text

Abstract:

With the rapid development of remote sensing (RS) observation technology over recent years, the high-level semantic association-based cross-modal retrieval of RS images has drawn some attention. However, few existing studies on cross-modal retrieval of RS images have addressed the issue of mutual interference between semantic features of images caused by “multi-scene semantics”. Therefore, we proposed a novel cross-attention (CA) model, called CABIR, based on regional-level semantic features of RS images for cross-modal text-image retrieval. This technique utilizes the CA mechanism to implement cross-modal information interaction and guides the network with textual semantics to allocate weights and filter redundant features for image regions, reducing the effect of irrelevant scene semantics on retrieval. Furthermore, we proposed BERT plus Bi-GRU, a new approach to generating statement-level textual features, and designed an effective temperature control function to steer the CA network toward smooth running. Our experiment suggested that CABIR not only outperforms other state-of-the-art cross-modal image retrieval methods but also demonstrates high generalization ability and stability, with an average recall rate of up to 18.12%, 48.30%, and 55.53% over the datasets RSICD, UCM, and Sydney, respectively. The model proposed in this paper will be able to provide a possible solution to the problem of mutual interference of RS images with “multi-scene semantics” due to complex terrain objects.

APA, Harvard, Vancouver, ISO, and other styles

40

Zheng, Qibin, Xiaoguang Ren, Yi Liu, and Wei Qin. "Abstraction and Association: Cross-Modal Retrieval Based on Consistency between Semantic Structures." Mathematical Problems in Engineering 2020 (May 7, 2020): 1–17. http://dx.doi.org/10.1155/2020/2503137.

Full text

Abstract:

Cross-modal retrieval aims to find relevant data of different modalities, such as images and text. In order to bridge the modality gap, most existing methods require a lot of coupled sample pairs as training data. To reduce the demands for training data, we propose a cross-modal retrieval framework that utilizes both coupled and uncoupled samples. The framework consists of two parts: Abstraction that aims to provide high-level single-modal representations with uncoupled samples; then, Association links different modalities through a few coupled training samples. Moreover, under this framework, we implement a cross-modal retrieval method based on the consistency between the semantic structure of multiple modalities. First, both images and text are represented with the semantic structure-based representation, which represents each sample as its similarity from the reference points that are generated from single-modal clustering. Then, the reference points of different modalities are aligned through an active learning strategy. Finally, the cross-modal similarity can be measured with the consistency between the semantic structures. The experiment results demonstrate that given proper abstraction of single-modal data, the relationship between different modalities can be simplified, and even limited coupled cross-modal training data are sufficient for satisfactory retrieval accuracy.

APA, Harvard, Vancouver, ISO, and other styles

41

Geigle, Gregor, Jonas Pfeiffer, Nils Reimers, Ivan Vulić, and Iryna Gurevych. "Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval." Transactions of the Association for Computational Linguistics 10 (2022): 503–21. http://dx.doi.org/10.1162/tacl_a_00473.

Full text

Abstract:

Abstract Current state-of-the-art approaches to cross- modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. While offering unmatched retrieval performance, such models: 1) are typically pretrained from scratch and thus less scalable, 2) suffer from huge retrieval latency and inefficiency issues, which makes them impractical in realistic applications. To address these crucial gaps towards both improved and efficient cross- modal retrieval, we propose a novel fine-tuning framework that turns any pretrained text-image multi-modal model into an efficient retrieval model. The framework is based on a cooperative retrieve-and-rerank approach that combines: 1) twin networks (i.e., a bi-encoder) to separately encode all items of a corpus, enabling efficient initial retrieval, and 2) a cross-encoder component for a more nuanced (i.e., smarter) ranking of the retrieved small set of items. We also propose to jointly fine- tune the two components with shared weights, yielding a more parameter-efficient model. Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross- encoders.1

APA, Harvard, Vancouver, ISO, and other styles

42

Li, Chao, Cheng Deng, Lei Wang, De Xie, and Xianglong Liu. "Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 176–83. http://dx.doi.org/10.1609/aaai.v33i01.3301176.

Full text

Abstract:

In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.

APA, Harvard, Vancouver, ISO, and other styles

43

Cheng, Shuli, Liejun Wang, and Anyu Du. "Deep Semantic-Preserving Reconstruction Hashing for Unsupervised Cross-Modal Retrieval." Entropy 22, no. 11 (November 7, 2020): 1266. http://dx.doi.org/10.3390/e22111266.

Full text

Abstract:

Deep hashing is the mainstream algorithm for large-scale cross-modal retrieval due to its high retrieval speed and low storage capacity, but the problem of reconstruction of modal semantic information is still very challenging. In order to further solve the problem of unsupervised cross-modal retrieval semantic reconstruction, we propose a novel deep semantic-preserving reconstruction hashing (DSPRH). The algorithm combines spatial and channel semantic information, and mines modal semantic information based on adaptive self-encoding and joint semantic reconstruction loss. The main contributions are as follows: (1) We introduce a new spatial pooling network module based on tensor regular-polymorphic decomposition theory to generate rank-1 tensor to capture high-order context semantics, which can assist the backbone network to capture important contextual modal semantic information. (2) Based on optimization perspective, we use global covariance pooling to capture channel semantic information and accelerate network convergence. In feature reconstruction layer, we use two bottlenecks auto-encoding to achieve visual-text modal interaction. (3) In metric learning, we design a new loss function to optimize model parameters, which can preserve the correlation between image modalities and text modalities. The DSPRH algorithm is tested on MIRFlickr-25K and NUS-WIDE. The experimental results show that DSPRH has achieved better performance on retrieval tasks.

APA, Harvard, Vancouver, ISO, and other styles

44

He, Chao, Dalin Wang, Zefu Tan, Liming Xu, and Nina Dai. "Cross-Modal Discrimination Hashing Retrieval Using Variable Length." Security and Communication Networks 2022 (September 9, 2022): 1–12. http://dx.doi.org/10.1155/2022/9638683.

Full text

Abstract:

Fast cross-modal retrieval technology based on hash coding has become a hot topic for the rich multimodal data (text, image, audio, etc.), especially security and privacy challenges in the Internet of Things and mobile edge computing. However, most methods based on hash coding are only mapped to the common hash coding space, and it relaxes the two value constraints of hash coding. Therefore, the learning of the multimodal hash coding may not be sufficient and effective to express the original multimodal data and cause the hash encoding category to be less discriminatory. For the sake of solving these problems, this paper proposes a method of mapping each modal data to the optimal length of hash coding space, respectively, and then the hash encoding of each modal data is solved by the discrete cross-modal hash algorithm of two value constraints. Finally, the similarity of multimodal data is compared in the potential space. The experimental results of the cross-model retrieval based on variable hash coding are better than that of the relative comparison methods in the WIKI data set, NUS-WIDE data set, as well as MIRFlickr data set, and the method we proposed is proved to be feasible and effective.

APA, Harvard, Vancouver, ISO, and other styles

45

Alikhani, Malihe, Fangda Han, Hareesh Ravi, Mubbasir Kapadia, Vladimir Pavlovic, and Matthew Stone. "Cross-Modal Coherence for Text-to-Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10427–35. http://dx.doi.org/10.1609/aaai.v36i10.21285.

Full text

Abstract:

Common image-text joint understanding techniques presume that images and the associated text can universally be characterized by a single implicit model. However, co-occurring images and text can be related in qualitatively different ways, and explicitly modeling it could improve the performance of current joint understanding models. In this paper, we train a Cross-Modal Coherence Model for text-to-image retrieval task. Our analysis shows that models trained with image–text coherence relations can retrieve images originally paired with target text more often than coherence-agnostic models. We also show via human evaluation that images retrieved by the proposed coherence-aware model are preferred over a coherence-agnostic baseline by a huge margin. Our findings provide insights into the ways that different modalities communicate and the role of coherence relations in capturing commonsense inferences in text and imagery.

APA, Harvard, Vancouver, ISO, and other styles

46

Fang, Xiaozhao, Zhihu Liu, Na Han, Lin Jiang, and Shaohua Teng. "Discrete matrix factorization hashing for cross-modal retrieval." International Journal of Machine Learning and Cybernetics 12, no. 10 (August 2, 2021): 3023–36. http://dx.doi.org/10.1007/s13042-021-01395-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Tang, Jun, Ke Wang, and Ling Shao. "Supervised Matrix Factorization Hashing for Cross-Modal Retrieval." IEEE Transactions on Image Processing 25, no. 7 (July 2016): 3157–66. http://dx.doi.org/10.1109/tip.2016.2564638.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Mandal, Devraj, Kunal N. Chaudhury, and Soma Biswas. "Generalized Semantic Preserving Hashing for Cross-Modal Retrieval." IEEE Transactions on Image Processing 28, no. 1 (January 2019): 102–12. http://dx.doi.org/10.1109/tip.2018.2863040.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Yao, Tao, Zhiwang Zhang, Lianshan Yan, Jun Yue, and Qi Tian. "Discrete Robust Supervised Hashing for Cross-Modal Retrieval." IEEE Access 7 (2019): 39806–14. http://dx.doi.org/10.1109/access.2019.2897249.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Li, Kai, Guo-Jun Qi, Jun Ye, and Kien A. Hua. "Linear Subspace Ranking Hashing for Cross-Modal Retrieval." IEEE Transactions on Pattern Analysis and Machine Intelligence 39, no. 9 (September 1, 2017): 1825–38. http://dx.doi.org/10.1109/tpami.2016.2610969.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!