Academic literature on the topic 'Cross-Modal Retrieval and Hashing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Cross-Modal Retrieval and Hashing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Cross-Modal Retrieval and Hashing"

1

Liu, Huan, Jiang Xiong, Nian Zhang, Fuming Liu, and Xitao Zou. "Quadruplet-Based Deep Cross-Modal Hashing." Computational Intelligence and Neuroscience 2021 (July 2, 2021): 1–10. http://dx.doi.org/10.1155/2021/9968716.

Full text
Abstract:
Recently, benefitting from the storage and retrieval efficiency of hashing and the powerful discriminative feature extraction capability of deep neural networks, deep cross-modal hashing retrieval has drawn more and more attention. To preserve the semantic similarities of cross-modal instances during the hash mapping procedure, most existing deep cross-modal hashing methods usually learn deep hashing networks with a pairwise loss or a triplet loss. However, these methods may not fully explore the similarity relation across modalities. To solve this problem, in this paper, we introduce a quadruplet loss into deep cross-modal hashing and propose a quadruplet-based deep cross-modal hashing (termed QDCMH) method. Extensive experiments on two benchmark cross-modal retrieval datasets show that our proposed method achieves state-of-the-art performance and demonstrate the efficiency of the quadruplet loss in cross-modal hashing.
APA, Harvard, Vancouver, ISO, and other styles
2

Liu, Xuanwu, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Yazhou Ren, and Maozu Guo. "Ranking-Based Deep Cross-Modal Hashing." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 4400–4407. http://dx.doi.org/10.1609/aaai.v33i01.33014400.

Full text
Abstract:
Cross-modal hashing has been receiving increasing interests for its low storage cost and fast query speed in multi-modal data retrievals. However, most existing hashing methods are based on hand-crafted or raw level features of objects, which may not be optimally compatible with the coding process. Besides, these hashing methods are mainly designed to handle simple pairwise similarity. The complex multilevel ranking semantic structure of instances associated with multiple labels has not been well explored yet. In this paper, we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH firstly uses the feature and label information of data to derive a semi-supervised semantic ranking list. Next, to expand the semantic representation power of hand-crafted features, RDCMH integrates the semantic ranking information into deep cross-modal hashing and jointly optimizes the compatible parameters of deep feature representations and of hashing functions. Experiments on real multi-modal datasets show that RDCMH outperforms other competitive baselines and achieves the state-of-the-art performance in cross-modal retrieval applications.
APA, Harvard, Vancouver, ISO, and other styles
3

Yang, Xiaohan, Zhen Wang, Nannan Wu, Guokun Li, Chuang Feng, and Pingping Liu. "Unsupervised Deep Relative Neighbor Relationship Preserving Cross-Modal Hashing." Mathematics 10, no. 15 (July 28, 2022): 2644. http://dx.doi.org/10.3390/math10152644.

Full text
Abstract:
The image-text cross-modal retrieval task, which aims to retrieve the relevant image from text and vice versa, is now attracting widespread attention. To quickly respond to the large-scale task, we propose an Unsupervised Deep Relative Neighbor Relationship Preserving Cross-Modal Hashing (DRNPH) to achieve cross-modal retrieval in the common Hamming space, which has the advantages of storage and efficiency. To fulfill the nearest neighbor search in the Hamming space, we demand to reconstruct both the original intra- and inter-modal neighbor matrix according to the binary feature vectors. Thus, we can compute the neighbor relationship among different modal samples directly based on the Hamming distances. Furthermore, the cross-modal pair-wise similarity preserving constraint requires the similar sample pair have an identical Hamming distance to the anchor. Therefore, the similar sample pairs own the same binary code, and they have minimal Hamming distances. Unfortunately, the pair-wise similarity preserving constraint may lead to an imbalanced code problem. Therefore, we propose the cross-modal triplet relative similarity preserving constraint, which demands the Hamming distances of similar pairs should be less than those of dissimilar pairs to distinguish the samples’ ranking orders in the retrieval results. Moreover, a large similarity marginal can boost the algorithm’s noise robustness. We conduct the cross-modal retrieval comparative experiments and ablation study on two public datasets, MIRFlickr and NUS-WIDE, respectively. The experimental results show that DRNPH outperforms the state-of-the-art approaches in various image-text retrieval scenarios, and all three proposed constraints are necessary and effective for boosting cross-modal retrieval performance.
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Chao, Cheng Deng, Lei Wang, De Xie, and Xianglong Liu. "Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 176–83. http://dx.doi.org/10.1609/aaai.v33i01.3301176.

Full text
Abstract:
In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.
APA, Harvard, Vancouver, ISO, and other styles
5

刘, 志虎. "Label Consistency Hashing for Cross-Modal Retrieval." Computer Science and Application 11, no. 04 (2021): 1104–12. http://dx.doi.org/10.12677/csa.2021.114114.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Yao, Tao, Xiangwei Kong, Haiyan Fu, and Qi Tian. "Semantic consistency hashing for cross-modal retrieval." Neurocomputing 193 (June 2016): 250–59. http://dx.doi.org/10.1016/j.neucom.2016.02.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Shubai, Song Wu, and Li Wang. "Hierarchical semantic interaction-based deep hashing network for cross-modal retrieval." PeerJ Computer Science 7 (May 25, 2021): e552. http://dx.doi.org/10.7717/peerj-cs.552.

Full text
Abstract:
Due to the high efficiency of hashing technology and the high abstraction of deep networks, deep hashing has achieved appealing effectiveness and efficiency for large-scale cross-modal retrieval. However, how to efficiently measure the similarity of fine-grained multi-labels for multi-modal data and thoroughly explore the intermediate layers specific information of networks are still two challenges for high-performance cross-modal hashing retrieval. Thus, in this paper, we propose a novel Hierarchical Semantic Interaction-based Deep Hashing Network (HSIDHN) for large-scale cross-modal retrieval. In the proposed HSIDHN, the multi-scale and fusion operations are first applied to each layer of the network. A Bidirectional Bi-linear Interaction (BBI) policy is then designed to achieve the hierarchical semantic interaction among different layers, such that the capability of hash representations can be enhanced. Moreover, a dual-similarity measurement (“hard” similarity and “soft” similarity) is designed to calculate the semantic similarity of different modality data, aiming to better preserve the semantic correlation of multi-labels. Extensive experiment results on two large-scale public datasets have shown that the performance of our HSIDHN is competitive to state-of-the-art deep cross-modal hashing methods.
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Mingyong, Qiqi Li, Lirong Tang, Shuang Peng, Yan Ma, and Degang Yang. "Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model." Computational Intelligence and Neuroscience 2021 (July 17, 2021): 1–11. http://dx.doi.org/10.1155/2021/5107034.

Full text
Abstract:
Cross-modal hashing encodes heterogeneous multimedia data into compact binary code to achieve fast and flexible retrieval across different modalities. Due to its low storage cost and high retrieval efficiency, it has received widespread attention. Supervised deep hashing significantly improves search performance and usually yields more accurate results, but requires a lot of manual annotation of the data. In contrast, unsupervised deep hashing is difficult to achieve satisfactory performance due to the lack of reliable supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised knowledge distillation cross-modal hashing method based on semantic alignment (SAKDH), which can reconstruct the similarity matrix using the hidden correlation information of the pretrained unsupervised teacher model, and the reconstructed similarity matrix can be used to guide the supervised student model. Specifically, firstly, the teacher model adopted an unsupervised semantic alignment hashing method, which can construct a modal fusion similarity matrix. Secondly, under the supervision of teacher model distillation information, the student model can generate more discriminative hash codes. Experimental results on two extensive benchmark datasets (MIRFLICKR-25K and NUS-WIDE) show that compared to several representative unsupervised cross-modal hashing methods, the mean average precision (MAP) of our proposed method has achieved a significant improvement. It fully reflects its effectiveness in large-scale cross-modal data retrieval.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhong, Fangming, Zhikui Chen, and Geyong Min. "Deep Discrete Cross-Modal Hashing for Cross-Media Retrieval." Pattern Recognition 83 (November 2018): 64–77. http://dx.doi.org/10.1016/j.patcog.2018.05.018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Qi, Xiaojun, Xianhua Zeng, Shumin Wang, Yicai Xie, and Liming Xu. "Cross-modal variable-length hashing based on hierarchy." Intelligent Data Analysis 25, no. 3 (April 20, 2021): 669–85. http://dx.doi.org/10.3233/ida-205162.

Full text
Abstract:
Due to the emergence of the era of big data, cross-modal learning have been applied to many research fields. As an efficient retrieval method, hash learning is widely used frequently in many cross-modal retrieval scenarios. However, most of existing hashing methods use fixed-length hash codes, which increase the computational costs for large-size datasets. Furthermore, learning hash functions is an NP hard problem. To address these problems, we initially propose a novel method named Cross-modal Variable-length Hashing Based on Hierarchy (CVHH), which can learn the hash functions more accurately to improve retrieval performance, and also reduce the computational costs and training time. The main contributions of CVHH are: (1) We propose a variable-length hashing algorithm to improve the algorithm performance; (2) We apply the hierarchical architecture to effectively reduce the computational costs and training time. To validate the effectiveness of CVHH, our extensive experimental results show the superior performance compared with recent state-of-the-art cross-modal methods on three benchmark datasets, WIKI, NUS-WIDE and MIRFlickr.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Cross-Modal Retrieval and Hashing"

1

Shen, Yuming. "Deep binary representation learning for single/cross-modal data retrieval." Thesis, University of East Anglia, 2018. https://ueaeprints.uea.ac.uk/67635/.

Full text
Abstract:
Data similarity search is widely regarded as a classic topic in the realms of computer vision, machine learning and data mining. Providing a certain query, the retrieval model sorts out the related candidates in the database according to their similarities, where representation learning methods and nearest-neighbour search apply. As matching data features in Hamming space is computationally cheaper than in Euclidean space, learning to hash and binary representations are generally appreciated in modern retrieval models. Recent research seeks solutions in deep learning to formulate the hash functions, showing great potential in retrieval performance. In this thesis, we gradually extend our research topics and contributions from unsupervised single-modal deep hashing to supervised cross-modal hashing _nally zero-shot hashing problems, addressing the following challenges in deep hashing. First of all, existing unsupervised deep hashing works are still not attaining leading retrieval performance compared with the shallow ones. To improve this, a novel unsupervised single-modal hashing model is proposed in this thesis, named Deep Variational Binaries (DVB). We introduce the popular conditional variational auto-encoders to formulate the encoding function. By minimizing the reconstruction error of the latent variables, the proposed model produces compact binary codes without training supervision. Experiments on benchmarked datasets show that our model outperform existing unsupervised hashing methods. The second problem is that current cross-modal hashing methods only consider holistic image representations and fail to model descriptive sentences, which is inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To handle this problem, we propose a supervised deep cross-modal hashing model called Textual-Visual Deep Binaries (TVDB). Region-based neural networks and recurrent neural networks are involved in the image encoding network in order to make e_ective use of visual information, while the text encoder is built using a convolutional neural network. We additionally introduce an e_cient in-batch optimization routine to train the network parameters. The proposed mode successfully outperforms state-of-the-art methods on large-scale datasets. Finally, existing hashing models fail when the categories of query data have never been seen during training. This scenario is further extended into a novel zero-shot cross-modal hashing task in this thesis, and a Zero-shot Sketch-Image Hashing (ZSIH) scheme is then proposed with graph convolution and stochastic neurons. Experiments show that the proposed ZSIH model signi_cantly outperforms existing hashing algorithms in the zero-shot retrieval task. Experiments suggest our proposed and novel hashing methods outperform state-of-the-art researches in single-modal and cross-modal data retrieval.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhu, Meng. "Cross-modal semantic-associative labelling, indexing and retrieval of multimodal data." Thesis, University of Reading, 2010. http://centaur.reading.ac.uk/24828/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Saragiotis, Panagiotis. "Cross-modal classification and retrieval of multimodal data using combinations of neural networks." Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/843338/.

Full text
Abstract:
Current neurobiological thinking supported, in part, by experimentation stresses the importance of cross-modality. Uni-modal cognitive tasks, language and vision, for example, are performed with the help of many networks working simultaneously or sequentially; and for cross-modal tasks, like picture / object naming and word illustration, the output of these networks is combined to produce higher cognitive behaviour. The notion of multi-net processing is used typically in the pattern recognition literature, where ensemble networks of weak classifiers - typically supervised - appear to outperform strong classifiers. We have built a system, based on combinations of neural networks, that demonstrates how cross-modal classification can be used to retrieve multi-modal data using one of the available modalities of information. Two multi-net systems were used in this work: one comprising Kohonen SOMs that interact with each other via a Hebbian network and a fuzzy ARTMAP network where the interaction is through the embedded map field. The multi-nets were used for the cross-modal retrieval of images given keywords and for finding the appropriate keywords for an image. The systems were trained on two publicly available image databases that had collateral annotations on the images. The Hemera collection, comprising images of pre-segmented single objects, and the Corel collection with images of multiple objects were used for automatically generating various sets of input vectors. We have attempted to develop a method for evaluating the performance of multi-net systems using a monolithic network trained on modally-undifferentiated vectors as an intuitive bench-mark. To this extent single SOM and fuzzy ART networks were trained using a concatenated visual / linguistic vector to test the performance of multi-net systems with typical monolithic systems. Both multi-nets outperform the respective monolithic systems in terms of information retrieval measures of precision and recall on test images drawn from both datasets; the SOM multi-net outperforms the fuzzy ARTMAP both in terms of convergence and precision-recall. The performance of the SOM-based multi-net in retrieval, classification and auto-annotation is on a par with that of state of the art systems like "ALIP" and "Blobworld". Much of the neural network based simulations reported in the literature use supervised learning algorithms. Such algorithms are suited when classes of objects are predefined and objects in themselves are quite unique in terms of their attributes. We have compared the performance of our multi-net systems with that of a multi-layer perceptron (MLP). The MLP does show substantially greater precision and recall on a (fixed) class of objects when compared with our unsupervised systems. However when 'lesioned' -the network connectivity 'damaged' deliberately- the multi-net systems show a greater degree of robustness. Cross-modal systems appear to hold considerable intellectual and commercial potential and the multi-net approach facilitates the simulation of such systems.
APA, Harvard, Vancouver, ISO, and other styles
4

Surian, Didi. "Novel Applications Using Latent Variable Models." Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/14014.

Full text
Abstract:
Latent variable models have achieved a great success in many research communities, including machine learning, information retrieval, data mining, natural language processing, etc. Latent variable models use an assumption that the data, which is observable, has an affinity to some hidden/latent variables. In this thesis, we present a suite of novel applications using latent variable models. In particular, we (i) extend topic models using directional distributions, (ii) propose novel solutions using latent variable models to detect outliers (anomalies) and (iii) to answer cross-modal retrieval problem. We present a study of directional distributions in modeling data. Specifically, we implement the von Mises-Fisher (vMF) distribution and develop latent variable models which are based on directed graphical models. The directed graphical models are commonly used to represent the conditional dependency among the variables. Under Bayesian treatment, we propose approximate posterior inference algorithms using variational methods for the models. We show that by incorporating the vMF distribution, the quality of clustering is improved rather than by using word count-based topic models. Furthermore, with the properties of directional distributions in hand, we extend the applications to detect outliers in various data sets and settings. Finally, we present latent variable models that are based on supervised learning to answer the cross-modal retrieval problem. In the cross-modal retrieval problem, the objective is to find matching content across different modalities such as text and image. We explore various approaches such as by using one-class learning methods, generating negative instances and using ranking methods. We show that our models outperform generic approaches such as Canonical Correlation Analysis (CCA) and its variants.
APA, Harvard, Vancouver, ISO, and other styles
5

Tran, Thi Quynh Nhi. "Robust and comprehensive joint image-text representations." Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1096/document.

Full text
Abstract:
La présente thèse étudie la modélisation conjointe des contenus visuels et textuels extraits à partir des documents multimédias pour résoudre les problèmes intermodaux. Ces tâches exigent la capacité de ``traduire'' l'information d'une modalité vers une autre. Un espace de représentation commun, par exemple obtenu par l'Analyse Canonique des Corrélation ou son extension kernelisée est une solution généralement adoptée. Sur cet espace, images et texte peuvent être représentés par des vecteurs de même type sur lesquels la comparaison intermodale peut se faire directement.Néanmoins, un tel espace commun souffre de plusieurs déficiences qui peuvent diminuer la performance des ces tâches. Le premier défaut concerne des informations qui sont mal représentées sur cet espace pourtant très importantes dans le contexte de la recherche intermodale. Le deuxième défaut porte sur la séparation entre les modalités sur l'espace commun, ce qui conduit à une limite de qualité de traduction entre modalités. Pour faire face au premier défaut concernant les données mal représentées, nous avons proposé un modèle qui identifie tout d'abord ces informations et puis les combine avec des données relativement bien représentées sur l'espace commun. Les évaluations sur la tâche d'illustration de texte montrent que la prise en compte de ces information fortement améliore les résultats de la recherche intermodale. La contribution majeure de la thèse se concentre sur la séparation entre les modalités sur l'espace commun pour améliorer la performance des tâches intermodales. Nous proposons deux méthodes de représentation pour les documents bi-modaux ou uni-modaux qui regroupent à la fois des informations visuelles et textuelles projetées sur l'espace commun. Pour les documents uni-modaux, nous suggérons un processus de complétion basé sur un ensemble de données auxiliaires pour trouver les informations correspondantes dans la modalité absente. Ces informations complémentaires sont ensuite utilisées pour construire une représentation bi-modale finale pour un document uni-modal. Nos approches permettent d'obtenir des résultats de l'état de l'art pour la recherche intermodale ou la classification bi-modale et intermodale
This thesis investigates the joint modeling of visual and textual content of multimedia documents to address cross-modal problems. Such tasks require the ability to match information across modalities. A common representation space, obtained by eg Kernel Canonical Correlation Analysis, on which images and text can be both represented and directly compared is a generally adopted solution.Nevertheless, such a joint space still suffers from several deficiencies that may hinder the performance of cross-modal tasks. An important contribution of this thesis is therefore to identify two major limitations of such a space. The first limitation concerns information that is poorly represented on the common space yet very significant for a retrieval task. The second limitation consists in a separation between modalities on the common space, which leads to coarse cross-modal matching. To deal with the first limitation concerning poorly-represented data, we put forward a model which first identifies such information and then finds ways to combine it with data that is relatively well-represented on the joint space. Evaluations on emph{text illustration} tasks show that by appropriately identifying and taking such information into account, the results of cross-modal retrieval can be strongly improved. The major work in this thesis aims to cope with the separation between modalities on the joint space to enhance the performance of cross-modal tasks.We propose two representation methods for bi-modal or uni-modal documents that aggregate information from both the visual and textual modalities projected on the joint space. Specifically, for uni-modal documents we suggest a completion process relying on an auxiliary dataset to find the corresponding information in the absent modality and then use such information to build a final bi-modal representation for a uni-modal document. Evaluations show that our approaches achieve state-of-the-art results on several standard and challenging datasets for cross-modal retrieval or bi-modal and cross-modal classification
APA, Harvard, Vancouver, ISO, and other styles
6

Mandal, Devraj. "Cross-Modal Retrieval and Hashing." Thesis, 2020. https://etd.iisc.ac.in/handle/2005/4685.

Full text
Abstract:
The objective of cross-modal retrieval is to retrieve relevant items from one modality (say image), given a query from another modality (say textual document). Cross-modal retrieval has various applications like matching image-sketch, audio-visual, near infrared-RGB, etc. Different feature representations of the two modalities, absence of paired correspondences, etc. makes this a very challenging problem. In this thesis, we have extensively looked at the cross-modal retrieval problem from different aspects and proposed methodologies to address them. • In the first work, we propose a novel framework, which can work with unpaired data of the two modalities. The method has two-steps, consisting of a hash code learning stage followed by a hash function learning stage. The method can also generate unified hash representations in post-processing stage for even better performance. Finally, we investigate, formulate and address the cross-modal hashing problem in presence of missing similarity information between the data items. • In the second work, we investigate how to make the cross-modal hashing algorithms scalable so that it can handle large amounts of training data and propose two solutions. The first approach builds on mini-batch realization of the previously formulated objective and the second is based on matrix factorization. We also investigate whether it is possible to build a hashing based approach without the need to learn a hash function as is typically done in literature. Finally, we propose a strategy so that an already trained cross-modal approach can be adapted and updated to take into account the real life scenario of increasing label space, without retraining the entire model from scratch. • In the third work, we explore semi-supervised approaches for cross-modal retrieval. We first propose a novel framework, which can predict the labels of the unlabeled data using complementary information from the different modalities. The framework can be used as an add-on with any baseline cross-modal algorithm. The second approach estimates the labels of the unlabeled data using nearest neighbor strategy, and then train a network with skip connections to predict the true labels. • In the fourth work, we investigate the cross-modal problem in an incremental multiclass scenario, where new data may contain previously unseen categories. We propose a novel incremental cross-modal hashing algorithm, which can adapt itself to handle incoming data of new categories. At every stage, a small amount of old category data termed exemplars is used, so as not to forget the old data while trying to learn for the new incoming data. • Finally, we investigate the effect of label corruption on cross-modal algorithms. We first study the recently proposed training paradigms, which focuses on small loss samples to build noise-resistant image classification models and improve upon that model using techniques like self-supervision and relabeling of large loss samples. Next we extend this work for cross-modal retrieval under noisy data.
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Yan-Fu, and 李彥甫. "The Cross-Modal Method of Tag Labeling in Music Information Retrieval." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/45038305568580924323.

Full text
Abstract:
碩士
輔仁大學
資訊工程學系
96
A music object contains multi-facet feature, such as the average frequency, speed, timbre, melody, rhythm, genre and so on. We conclude that these features are extracted from various feature domains, respectively. Moreover, these feature do- mains are separated into two types, the quantified and the unquantifiable. Within the quantified feature domain, the features are quantified as the numerical value, for example, if there are three important average frequencies in a music object, we quantify and denote as three numerical values: 20Hz, 80Hz and 100Hz in the feature domain: average frequency. On the other hand, the features in the unquan- tifiable feature domain are described as the non-numerical value (e.g. letters) and it is difficultly defined by the mathematic method. For example, the genre of the music object is difficultly extracted by the filter. However, among the features of a music object, the unquantifiable features are important for the human auditory system. Therefore, we introduce a cross-modal association method to associate the quantified and the unquantifiable features. We represent the music objects including quantified and unquantifiable features as the multimedia graph (MMG) [1] which converts the association problem to the graph problem, and apply the link analysis algorithm to rank the nodes in the graph. Thus we label the music object by the rank of nodes.
APA, Harvard, Vancouver, ISO, and other styles
8

Yang, Bo. "Semantic-aware data processing towards cross-modal multimedia analysis and content-based retrieval in distributed and mobile environments /." 2007. http://etda.libraries.psu.edu/theses/approved/WorldWideIndex/ETD-1850/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ramanishka, Vasili. "Describing and retrieving visual content using natural language." Thesis, 2020. https://hdl.handle.net/2144/42026.

Full text
Abstract:
Modern deep learning methods have boosted research progress in visual recognition and text understanding but it is a non-trivial task to unite these advances from both disciplines. In this thesis, we develop models and techniques that allow us to connect natural language and visual content enabling automatic video subtitling, visual grounding, and text-based image search. Such models could be useful in a wide range of applications in robotics and human-computer interaction bridging the gap in vision and language understanding. First, we develop a model that generates natural language descriptions of the main activities and scenes depicted in short videos. While previous methods were constrained to a predefined list of objects, actions, or attributes, our model learns to generate descriptions directly from raw pixels. The model exploits available audio information and the video’s category (e.g., cooking, movie, education) to generate more relevant and coherent sentences. Then, we introduce a technique for visual grounding of generated sentences using the same video description model. Our approach allows for explaining the model’s prediction by localizing salient video regions for corresponding words in the generated sentence. Lastly, we address the problem of image retrieval. Existing cross-modal retrieval methods work by learning a common embedding space for different modalities using parallel data such as images and their accompanying descriptions. Instead, we focus on the case when images are connected by relative annotations: given the context set as an image and its metadata, the user can specify desired semantic changes using natural language instructions. The model needs to capture distinctive visual differences between image pairs as described by the user. Our approach enables interactive image search such that the natural language feedback significantly improves the efficacy of image retrieval. We show that the proposed methods advance the state-of-the-art for video captioning and image retrieval tasks in terms of both accuracy and interpretability.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Cross-Modal Retrieval and Hashing"

1

C, Peters, ed. Evaluation of multilingual and multi-modal information retrieval: 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain, September 20-22, 2006 ; revised selected papers. Berlin: Springer, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Gey, Fredric C., Paul Clough, Bernardo Magnini, Douglas W. Oard, and Jussi Karlgren. Evaluation of Multilingual and Multi-Modal Information Retrieval: 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain, September 20-22, 2006, Revised Selected Papers. Springer London, Limited, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Cross-Modal Retrieval and Hashing"

1

Zhu, Lei, Jingjing Li, and Weili Guan. "Cross-Modal Hashing." In Synthesis Lectures on Information Concepts, Retrieval, and Services, 45–89. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-37291-9_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Mandal, Devraj, Yashas Annadani, and Soma Biswas. "GrowBit: Incremental Hashing for Cross-Modal Retrieval." In Computer Vision – ACCV 2018, 305–21. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-20870-7_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Peng-Fei, Zi Huang, and Zheng Zhang. "Semantics-Reconstructing Hashing for Cross-Modal Retrieval." In Advances in Knowledge Discovery and Data Mining, 315–27. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-47436-2_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Zening, Yungong Sun, Liang Liu, and Ao Li. "Critical Separation Hashing for Cross-Modal Retrieval." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 171–79. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-36011-4_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Mingyang, Xiangwei Kong, Tao Yao, and Yujia Zhang. "Discrete Similarity Preserving Hashing for Cross-modal Retrieval." In Lecture Notes in Computer Science, 202–13. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-24265-7_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Xu, Jingnan, Tieying Li, Chong Xi, and Xiaochun Yang. "Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval." In Computer Supported Cooperative Work and Social Computing, 431–43. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-4549-6_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhu, Lei, Jingjing Li, and Weili Guan. "Composite Multi-modal Hashing." In Synthesis Lectures on Information Concepts, Retrieval, and Services, 91–144. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-37291-9_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Weng, Weiwei, Jiagao Wu, Lu Yang, Linfeng Liu, and Bin Hu. "Label-Based Deep Semantic Hashing for Cross-Modal Retrieval." In Neural Information Processing, 24–36. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-36718-3_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Xi, Hanjiang Lai, and Jiashi Feng. "Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval." In Computer Vision – ECCV 2018, 614–29. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01267-0_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Tang, Dianjuan, Hui Cui, Dan Shi, and Hua Ji. "Hypergraph-Based Discrete Hashing Learning for Cross-Modal Retrieval." In Advances in Multimedia Information Processing – PCM 2018, 776–86. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00776-8_71.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Cross-Modal Retrieval and Hashing"

1

Xu, Xing, Fumin Shen, Yang Yang, and Heng Tao Shen. "Discriminant Cross-modal Hashing." In ICMR'16: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2911996.2912056.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Luo, Xin, Peng-Fei Zhang, Ye Wu, Zhen-Duo Chen, Hua-Junjie Huang, and Xin-Shun Xu. "Asymmetric Discrete Cross-Modal Hashing." In ICMR '18: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3206025.3206034.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Moran, Sean, and Victor Lavrenko. "Regularised Cross-Modal Hashing." In SIGIR '15: The 38th International ACM SIGIR conference on research and development in Information Retrieval. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2766462.2767816.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Liu, Yao, Yanhong Yuan, Qialli Huang, and Zhixing Huang. "Hashing for Cross-Modal Similarity Retrieval." In 2015 11th International Conference on Semantics, Knowledge and Grids (SKG). IEEE, 2015. http://dx.doi.org/10.1109/skg.2015.9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Tian-yi, Lan Zhang, Shi-cong Zhang, Zi-long Li, and Bai-chuan Huang. "Extensible Cross-Modal Hashing." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/292.

Full text
Abstract:
Cross-modal hashing (CMH) models are introduced to significantly reduce the cost of large-scale cross-modal data retrieval systems. In many real-world applications, however, data of new categories arrive continuously, which requires the model has good extensibility. That is the model should be updated to accommodate data of new categories but still retain good performance for the old categories with minimum computation cost. Unfortunately, existing CMH methods fail to satisfy the extensibility requirements. In this work, we propose a novel extensible cross-modal hashing (ECMH) to enable highly efficient and low-cost model extension. Our proposed ECMH has several desired features: 1) it has good forward compatibility, so there is no need to update old hash codes; 2) the ECMH model is extended to support new data categories using only new data by a well-designed ``weak constraint incremental learning'' algorithm, which saves up to 91\% time cost comparing with retraining the model with both new and old data; 3) the extended model achieves high precision and recall on both old and new tasks. Our extensive experiments show the effectiveness of our design.
APA, Harvard, Vancouver, ISO, and other styles
6

Sun, Changchang, Xuemeng Song, Fuli Feng, Wayne Xin Zhao, Hao Zhang, and Liqiang Nie. "Supervised Hierarchical Cross-Modal Hashing." In SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3331184.3331229.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Yao, Hong-Lei, Yu-Wei Zhan, Zhen-Duo Chen, Xin Luo, and Xin-Shun Xu. "TEACH: Attention-Aware Deep Cross-Modal Hashing." In ICMR '21: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3460426.3463625.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Hongya, Shunxin Dai, Ming Du, Bo Xu, and Mingyong Li. "Revisiting Performance Measures for Cross-Modal Hashing." In ICMR '22: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3512527.3531363.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Shi, Yufeng, Xinge You, Feng Zheng, Shuo Wang, and Qinmu Peng. "Equally-Guided Discriminative Hashing for Cross-modal Retrieval." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/662.

Full text
Abstract:
Cross-modal hashing intends to project data from two modalities into a common hamming space to perform cross-modal retrieval efficiently. Despite satisfactory performance achieved on real applications, existing methods are incapable of effectively preserving semantic structure to maintain inter-class relationship and improving discriminability to make intra-class samples aggregated simultaneously, which thus limits the higher retrieval performance. To handle this problem, we propose Equally-Guided Discriminative Hashing (EGDH), which jointly takes into consideration semantic structure and discriminability. Specifically, we discover the connection between semantic structure preserving and discriminative methods. Based on it, we directly encode multi-label annotations that act as high-level semantic features to build a common semantic structure preserving classifier. With the common classifier to guide the learning of different modal hash functions equally, hash codes of samples are intra-class aggregated and inter-class relationship preserving. Experimental results on two benchmark datasets demonstrate the superiority of EGDH compared with the state-of-the-arts.
APA, Harvard, Vancouver, ISO, and other styles
10

Tan, Shoubiao, Lingyu Hu, Anqi Wang-Xu, Jun Tang, and Zhaohong Jia. "Kernelized cross-modal hashing for multimedia retrieval." In 2016 12th World Congress on Intelligent Control and Automation (WCICA). IEEE, 2016. http://dx.doi.org/10.1109/wcica.2016.7578693.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography