Academic literature on the topic 'Scalability in Cross-Modal Retrieval'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Scalability in Cross-Modal Retrieval.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Scalability in Cross-Modal Retrieval"

1

Hu, Peng, Hongyuan Zhu, Xi Peng, and Jie Lin. "Semi-Supervised Multi-Modal Learning with Balanced Spectral Decomposition." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 99–106. http://dx.doi.org/10.1609/aaai.v34i01.5339.

Full text
Abstract:
Cross-modal retrieval aims to retrieve the relevant samples across different modalities, of which the key problem is how to model the correlations among different modalities while narrowing the large heterogeneous gap. In this paper, we propose a Semi-supervised Multimodal Learning Network method (SMLN) which correlates different modalities by capturing the intrinsic structure and discriminative correlation of the multimedia data. To be specific, the labeled and unlabeled data are used to construct a similarity matrix which integrates the cross-modal correlation, discrimination, and intra-modal graph information existing in the multimedia data. What is more important is that we propose a novel optimization approach to optimize our loss within a neural network which involves a spectral decomposition problem derived from a ratio trace criterion. Our optimization enjoys two advantages given below. On the one hand, the proposed approach is not limited to our loss, which could be applied to any case that is a neural network with the ratio trace criterion. On the other hand, the proposed optimization is different from existing ones which alternatively maximize the minor eigenvalues, thus overemphasizing the minor eigenvalues and ignore the dominant ones. In contrast, our method will exactly balance all eigenvalues, thus being more competitive to existing methods. Thanks to our loss and optimization strategy, our method could well preserve the discriminative and instinct information into the common space and embrace the scalability in handling large-scale multimedia data. To verify the effectiveness of the proposed method, extensive experiments are carried out on three widely-used multimodal datasets comparing with 13 state-of-the-art approaches.
APA, Harvard, Vancouver, ISO, and other styles
2

Rasheed, Ali Salim, Davood Zabihzadeh, and Sumia Abdulhussien Razooqi Al-Obaidi. "Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 13 (May 26, 2020): 2050034. http://dx.doi.org/10.1142/s0218001420500342.

Full text
Abstract:
Metric learning algorithms aim to make the conceptually related data items closer and keep dissimilar ones at a distance. The most common approach for metric learning on the Mahalanobis method. Despite its success, this method is limited to find a linear projection and also suffer from scalability respecting both the dimensionality and the size of input data. To address these problems, this paper presents a new scalable metric learning algorithm for multi-modal data. Our method learns an optimal metric for any feature set of the multi-modal data in an online fashion. We also combine the learned metrics with a novel Passive/Aggressive (PA)-based algorithm which results in a higher convergence rate compared to the state-of-the-art methods. To address scalability with respect to dimensionality, Dual Random Projection (DRP) is adopted in this paper. The present method is evaluated on some challenging machine vision datasets for image classification and Content-Based Information Retrieval (CBIR) tasks. The experimental results confirm that the proposed method significantly surpasses other state-of-the-art metric learning methods in most of these datasets in terms of both accuracy and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
3

Zalkow, Frank, and Meinard Müller. "Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music." Applied Sciences 10, no. 1 (December 18, 2019): 19. http://dx.doi.org/10.3390/app10010019.

Full text
Abstract:
Cross-version music retrieval aims at identifying all versions of a given piece of music using a short query audio fragment. One previous approach, which is particularly suited for Western classical music, is based on a nearest neighbor search using short sequences of chroma features, also referred to as audio shingles. From the viewpoint of efficiency, indexing and dimensionality reduction are important aspects. In this paper, we extend previous work by adapting two embedding techniques; one is based on classical principle component analysis, and the other is based on neural networks with triplet loss. Furthermore, we report on systematically conducted experiments with Western classical music recordings and discuss the trade-off between retrieval quality and embedding dimensionality. As one main result, we show that, using neural networks, one can reduce the audio shingles from 240 to fewer than 8 dimensions with only a moderate loss in retrieval accuracy. In addition, we present extended experiments with databases of different sizes and different query lengths to test the scalability and generalizability of the dimensionality reduction methods. We also provide a more detailed view into the retrieval problem by analyzing the distances that appear in the nearest neighbor search.
APA, Harvard, Vancouver, ISO, and other styles
4

Huang, Xiaobing, Tian Zhao, and Yu Cao. "PIR." International Journal of Multimedia Data Engineering and Management 5, no. 3 (July 2014): 1–27. http://dx.doi.org/10.4018/ijmdem.2014070101.

Full text
Abstract:
Multimedia Information Retrieval (MIR) is a problem domain that includes programming tasks such as salient feature extraction, machine learning, indexing, and retrieval. There are a variety of implementations and algorithms for these tasks in different languages and frameworks, which are difficult to compose and reuse due to the interface and language incompatibility. Due to this low reusability, researchers often have to implement their experiments from scratch and the resulting programs cannot be easily adapted to parallel and distributed executions, which is important for handling large data sets. In this paper, we present Pipeline Information Retrieval (PIR), a Domain Specific Language (DSL) for multi-modal feature manipulation. The goal of PIR is to unify the MIR programming tasks by hiding the programming details under a flexible layer of domain specific interface. PIR optimizes the MIR tasks by compiling the DSL programs into pipeline graphs, which can be executed using a variety of strategies (e.g. sequential, parallel, or distributed execution). The authors evaluated the performance of PIR applications on single machine with multiple cores, local cluster, and Amazon Elastic Compute Cloud (EC2) platform. The result shows that the PIR programs can greatly help MIR researchers and developers perform fast prototyping on single machine environment and achieve nice scalability on distributed platforms.
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Zhen, Xu Wu, and Shuang Wei. "Cross-Domain Access Control Model in Industrial IoT Environment." Applied Sciences 13, no. 8 (April 17, 2023): 5042. http://dx.doi.org/10.3390/app13085042.

Full text
Abstract:
The Industrial Internet of Things (IIoT) accelerates smart manufacturing and boosts production efficiency through heterogeneous industrial equipment, intelligent sensors, and actuators. The Industrial Internet of Things is transforming from a traditional factory model to a new manufacturing mode, which allows cross-domain data-sharing among multiple system departments to enable smart manufacturing. A complete industrial product comes from the combined efforts of many different departments. Therefore, secure and reliable cross-domain access control has become the key to ensuring the security of cross-domain communication and resource-sharing. Traditional centralized access control schemes are prone to single-point failure problems. Recently, many researchers have integrated blockchain technology into access control models. However, most blockchain-based approaches use a single-chain structure, which has weak data management capability and scalability, while ensuring system security, and low access control efficiency, making it difficult to meet the needs of multi-domain cooperation in IIoT scenarios. Therefore, this paper proposes a decentralized cross-domain access model based on a master–slave chain with high scalability. Moreover, the model ensures the security and reliability of the master chain through a reputation-based node selection mechanism. Access control efficiency is improved by a grouping strategy retrieval method in the access control process. The experimental benchmarks of the proposed scheme use various performance metrics to highlight its applicability in the IIoT environment. The results show an 82% improvement in the throughput for the master–slave chain structure over the single-chain structure. There is also an improvement in the throughput and latency compared to the results of other studies.
APA, Harvard, Vancouver, ISO, and other styles
6

An, Duo, Alan Chiu, James A. Flanders, Wei Song, Dahua Shou, Yen-Chun Lu, Lars G. Grunnet, et al. "Designing a retrievable and scalable cell encapsulation device for potential treatment of type 1 diabetes." Proceedings of the National Academy of Sciences 115, no. 2 (December 26, 2017): E263—E272. http://dx.doi.org/10.1073/pnas.1708806115.

Full text
Abstract:
Cell encapsulation has been shown to hold promise for effective, long-term treatment of type 1 diabetes (T1D). However, challenges remain for its clinical applications. For example, there is an unmet need for an encapsulation system that is capable of delivering sufficient cell mass while still allowing convenient retrieval or replacement. Here, we report a simple cell encapsulation design that is readily scalable and conveniently retrievable. The key to this design was to engineer a highly wettable, Ca2+-releasing nanoporous polymer thread that promoted uniform in situ cross-linking and strong adhesion of a thin layer of alginate hydrogel around the thread. The device provided immunoprotection of rat islets in immunocompetent C57BL/6 mice in a short-term (1-mo) study, similar to neat alginate fibers. However, the mechanical property of the device, critical for handling and retrieval, was much more robust than the neat alginate fibers due to the reinforcement of the central thread. It also had facile mass transfer due to the short diffusion distance. We demonstrated the therapeutic potential of the device through the correction of chemically induced diabetes in C57BL/6 mice using rat islets for 3 mo as well as in immunodeficient SCID-Beige mice using human islets for 4 mo. We further showed, as a proof of concept, the scalability and retrievability in dogs. After 1 mo of implantation in dogs, the device could be rapidly retrieved through a minimally invasive laparoscopic procedure. This encapsulation device may contribute to a cellular therapy for T1D because of its retrievability and scale-up potential.
APA, Harvard, Vancouver, ISO, and other styles
7

Tamchyna, Aleš, Ondřej Dušek, Rudolf Rosa, and Pavel Pecina. "MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service." Prague Bulletin of Mathematical Linguistics 100, no. 1 (October 1, 2013): 31–40. http://dx.doi.org/10.2478/pralin-2013-0009.

Full text
Abstract:
Abstract We present a web service which handles and distributes JSON-encoded HTTP requests for machine translation (MT) among multiple machines running an MT system, including text pre- and post-processing. It is currently used to provide MT between several languages for cross-lingual information retrieval in the EU FP7 Khresmoi project. The software consists of an application server and remote workers which handle text processing and communicate translation requests to MT systems. The communication between the application server and the workers is based on the XML-RPC protocol. We present the overall design of the software and test results which document speed and scalability of our solution. Our software is licensed under the Apache 2.0 licence and is available for download from the Lindat-Clarin repository and Github.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Chengyuan, Jiayu Song, Xiaofeng Zhu, Lei Zhu, and Shichao Zhang. "HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval." ACM Transactions on Multimedia Computing, Communications, and Applications 17, no. 1s (April 20, 2021): 1–22. http://dx.doi.org/10.1145/3412847.

Full text
Abstract:
The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.
APA, Harvard, Vancouver, ISO, and other styles
9

Wu, Yiling, Shuhui Wang, and Qingming Huang. "Multi-modal semantic autoencoder for cross-modal retrieval." Neurocomputing 331 (February 2019): 165–75. http://dx.doi.org/10.1016/j.neucom.2018.11.042.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Devezas, José. "Graph-based entity-oriented search." ACM SIGIR Forum 55, no. 1 (June 2021): 1–2. http://dx.doi.org/10.1145/3476415.3476430.

Full text
Abstract:
Entity-oriented search has revolutionized search engines. In the era of Google Knowledge Graph and Microsoft Satori, users demand an effortless process of search. Whether they express an information need through a keyword query, expecting documents and entities, or through a clicked entity, expecting related entities, there is an inherent need for the combination of corpora and knowledge bases to obtain an answer. Such integration frequently relies on independent signals extracted from inverted indexes, and from quad indexes indirectly accessed through queries to a triplestore. However, relying on two separate representation models inhibits the effective cross-referencing of information, discarding otherwise available relations that could lead to a better ranking. Moreover, different retrieval tasks often demand separate implementations, although the problem is, at its core, the same. With the goal of harnessing all available information to optimize retrieval, we explore joint representation models of documents and entities, while taking a step towards the definition of a more general retrieval approach. Specifically, we propose that graphs should be used to incorporate explicit and implicit information derived from the relations between text found in corpora and entities found in knowledge bases. We also take advantage of this framework to elaborate a general model for entity-oriented search, proposing a universal ranking function for the tasks of ad hoc document retrieval (leveraging entities), ad hoc entity retrieval, and entity list completion. At a conceptual stage, we begin by proposing the graph-of-entity, based on the relations between combinations of term and entity nodes. We introduce the entity weight as the corresponding ranking function, relying on the idea of seed nodes for representing the query, either directly through term nodes, or based on the expansion to adjacent entity nodes. The score is computed based on a series of geodesic distances to the remaining nodes, providing a ranking for the documents (or entities) in the graph. In order to improve on the low scalability of the graph-of-entity, we then redesigned this model in a way that reduced the number of edges in relation to the number of nodes, by relying on the hypergraph data structure. The resulting model, which we called hypergraph-of-entity, is the main contribution of this thesis. The obtained reduction was achieved by replacing binary edges with n -ary relations based on sets of nodes and entities (undirected document hyperedges), sets of entities (undirected hyperedges, either based on cooccurrence or a grouping by semantic subject), and pairs of a set of terms and a set of one entity (directed hyperedges, mapping text to an object). We introduce the random walk score as the corresponding ranking function, relying on the same idea of seed nodes, similar to the entity weight in the graph-of-entity. Scoring based on this function is highly reliant on the structure of the hypergraph, which we call representation-driven retrieval. As such, we explore several extensions of the hypergraph-of-entity, including relations of synonymy, or contextual similarity, as well as different weighting functions per node and hyperedge type. We also propose TF-bins as a discretization for representing term frequency in the hypergraph-of-entity. For the random walk score, we propose and explore several parameters, including length and repeats, with or without seed node expansion, direction, or weights, and with or without a certain degree of node and/or hyperedge fatigue, a concept that we also propose. For evaluation, we took advantage of TREC 2017 OpenSearch track, which relied on an online evaluation process based on the Living Labs API, and we also participated in TREC 2018 Common Core track, which was based on the newly introduced TREC Washington Post Corpus. Our main experiments were supported on the INEX 2009 Wikipedia collection, which proved to be a fundamental test collection for assessing retrieval effectiveness across multiple tasks. At first, our experiments solely focused on ad hoc document retrieval, ensuring that the model performed adequately for a classical task. We then expanded the work to cover all three entity-oriented search tasks. Results supported the viability of a general retrieval model, opening novel challenges in information retrieval, and proposing a new path towards generality in this area.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Scalability in Cross-Modal Retrieval"

1

Shen, Yuming. "Deep binary representation learning for single/cross-modal data retrieval." Thesis, University of East Anglia, 2018. https://ueaeprints.uea.ac.uk/67635/.

Full text
Abstract:
Data similarity search is widely regarded as a classic topic in the realms of computer vision, machine learning and data mining. Providing a certain query, the retrieval model sorts out the related candidates in the database according to their similarities, where representation learning methods and nearest-neighbour search apply. As matching data features in Hamming space is computationally cheaper than in Euclidean space, learning to hash and binary representations are generally appreciated in modern retrieval models. Recent research seeks solutions in deep learning to formulate the hash functions, showing great potential in retrieval performance. In this thesis, we gradually extend our research topics and contributions from unsupervised single-modal deep hashing to supervised cross-modal hashing _nally zero-shot hashing problems, addressing the following challenges in deep hashing. First of all, existing unsupervised deep hashing works are still not attaining leading retrieval performance compared with the shallow ones. To improve this, a novel unsupervised single-modal hashing model is proposed in this thesis, named Deep Variational Binaries (DVB). We introduce the popular conditional variational auto-encoders to formulate the encoding function. By minimizing the reconstruction error of the latent variables, the proposed model produces compact binary codes without training supervision. Experiments on benchmarked datasets show that our model outperform existing unsupervised hashing methods. The second problem is that current cross-modal hashing methods only consider holistic image representations and fail to model descriptive sentences, which is inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To handle this problem, we propose a supervised deep cross-modal hashing model called Textual-Visual Deep Binaries (TVDB). Region-based neural networks and recurrent neural networks are involved in the image encoding network in order to make e_ective use of visual information, while the text encoder is built using a convolutional neural network. We additionally introduce an e_cient in-batch optimization routine to train the network parameters. The proposed mode successfully outperforms state-of-the-art methods on large-scale datasets. Finally, existing hashing models fail when the categories of query data have never been seen during training. This scenario is further extended into a novel zero-shot cross-modal hashing task in this thesis, and a Zero-shot Sketch-Image Hashing (ZSIH) scheme is then proposed with graph convolution and stochastic neurons. Experiments show that the proposed ZSIH model signi_cantly outperforms existing hashing algorithms in the zero-shot retrieval task. Experiments suggest our proposed and novel hashing methods outperform state-of-the-art researches in single-modal and cross-modal data retrieval.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhu, Meng. "Cross-modal semantic-associative labelling, indexing and retrieval of multimodal data." Thesis, University of Reading, 2010. http://centaur.reading.ac.uk/24828/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Saragiotis, Panagiotis. "Cross-modal classification and retrieval of multimodal data using combinations of neural networks." Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/843338/.

Full text
Abstract:
Current neurobiological thinking supported, in part, by experimentation stresses the importance of cross-modality. Uni-modal cognitive tasks, language and vision, for example, are performed with the help of many networks working simultaneously or sequentially; and for cross-modal tasks, like picture / object naming and word illustration, the output of these networks is combined to produce higher cognitive behaviour. The notion of multi-net processing is used typically in the pattern recognition literature, where ensemble networks of weak classifiers - typically supervised - appear to outperform strong classifiers. We have built a system, based on combinations of neural networks, that demonstrates how cross-modal classification can be used to retrieve multi-modal data using one of the available modalities of information. Two multi-net systems were used in this work: one comprising Kohonen SOMs that interact with each other via a Hebbian network and a fuzzy ARTMAP network where the interaction is through the embedded map field. The multi-nets were used for the cross-modal retrieval of images given keywords and for finding the appropriate keywords for an image. The systems were trained on two publicly available image databases that had collateral annotations on the images. The Hemera collection, comprising images of pre-segmented single objects, and the Corel collection with images of multiple objects were used for automatically generating various sets of input vectors. We have attempted to develop a method for evaluating the performance of multi-net systems using a monolithic network trained on modally-undifferentiated vectors as an intuitive bench-mark. To this extent single SOM and fuzzy ART networks were trained using a concatenated visual / linguistic vector to test the performance of multi-net systems with typical monolithic systems. Both multi-nets outperform the respective monolithic systems in terms of information retrieval measures of precision and recall on test images drawn from both datasets; the SOM multi-net outperforms the fuzzy ARTMAP both in terms of convergence and precision-recall. The performance of the SOM-based multi-net in retrieval, classification and auto-annotation is on a par with that of state of the art systems like "ALIP" and "Blobworld". Much of the neural network based simulations reported in the literature use supervised learning algorithms. Such algorithms are suited when classes of objects are predefined and objects in themselves are quite unique in terms of their attributes. We have compared the performance of our multi-net systems with that of a multi-layer perceptron (MLP). The MLP does show substantially greater precision and recall on a (fixed) class of objects when compared with our unsupervised systems. However when 'lesioned' -the network connectivity 'damaged' deliberately- the multi-net systems show a greater degree of robustness. Cross-modal systems appear to hold considerable intellectual and commercial potential and the multi-net approach facilitates the simulation of such systems.
APA, Harvard, Vancouver, ISO, and other styles
4

Surian, Didi. "Novel Applications Using Latent Variable Models." Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/14014.

Full text
Abstract:
Latent variable models have achieved a great success in many research communities, including machine learning, information retrieval, data mining, natural language processing, etc. Latent variable models use an assumption that the data, which is observable, has an affinity to some hidden/latent variables. In this thesis, we present a suite of novel applications using latent variable models. In particular, we (i) extend topic models using directional distributions, (ii) propose novel solutions using latent variable models to detect outliers (anomalies) and (iii) to answer cross-modal retrieval problem. We present a study of directional distributions in modeling data. Specifically, we implement the von Mises-Fisher (vMF) distribution and develop latent variable models which are based on directed graphical models. The directed graphical models are commonly used to represent the conditional dependency among the variables. Under Bayesian treatment, we propose approximate posterior inference algorithms using variational methods for the models. We show that by incorporating the vMF distribution, the quality of clustering is improved rather than by using word count-based topic models. Furthermore, with the properties of directional distributions in hand, we extend the applications to detect outliers in various data sets and settings. Finally, we present latent variable models that are based on supervised learning to answer the cross-modal retrieval problem. In the cross-modal retrieval problem, the objective is to find matching content across different modalities such as text and image. We explore various approaches such as by using one-class learning methods, generating negative instances and using ranking methods. We show that our models outperform generic approaches such as Canonical Correlation Analysis (CCA) and its variants.
APA, Harvard, Vancouver, ISO, and other styles
5

Tran, Thi Quynh Nhi. "Robust and comprehensive joint image-text representations." Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1096/document.

Full text
Abstract:
La présente thèse étudie la modélisation conjointe des contenus visuels et textuels extraits à partir des documents multimédias pour résoudre les problèmes intermodaux. Ces tâches exigent la capacité de ``traduire'' l'information d'une modalité vers une autre. Un espace de représentation commun, par exemple obtenu par l'Analyse Canonique des Corrélation ou son extension kernelisée est une solution généralement adoptée. Sur cet espace, images et texte peuvent être représentés par des vecteurs de même type sur lesquels la comparaison intermodale peut se faire directement.Néanmoins, un tel espace commun souffre de plusieurs déficiences qui peuvent diminuer la performance des ces tâches. Le premier défaut concerne des informations qui sont mal représentées sur cet espace pourtant très importantes dans le contexte de la recherche intermodale. Le deuxième défaut porte sur la séparation entre les modalités sur l'espace commun, ce qui conduit à une limite de qualité de traduction entre modalités. Pour faire face au premier défaut concernant les données mal représentées, nous avons proposé un modèle qui identifie tout d'abord ces informations et puis les combine avec des données relativement bien représentées sur l'espace commun. Les évaluations sur la tâche d'illustration de texte montrent que la prise en compte de ces information fortement améliore les résultats de la recherche intermodale. La contribution majeure de la thèse se concentre sur la séparation entre les modalités sur l'espace commun pour améliorer la performance des tâches intermodales. Nous proposons deux méthodes de représentation pour les documents bi-modaux ou uni-modaux qui regroupent à la fois des informations visuelles et textuelles projetées sur l'espace commun. Pour les documents uni-modaux, nous suggérons un processus de complétion basé sur un ensemble de données auxiliaires pour trouver les informations correspondantes dans la modalité absente. Ces informations complémentaires sont ensuite utilisées pour construire une représentation bi-modale finale pour un document uni-modal. Nos approches permettent d'obtenir des résultats de l'état de l'art pour la recherche intermodale ou la classification bi-modale et intermodale
This thesis investigates the joint modeling of visual and textual content of multimedia documents to address cross-modal problems. Such tasks require the ability to match information across modalities. A common representation space, obtained by eg Kernel Canonical Correlation Analysis, on which images and text can be both represented and directly compared is a generally adopted solution.Nevertheless, such a joint space still suffers from several deficiencies that may hinder the performance of cross-modal tasks. An important contribution of this thesis is therefore to identify two major limitations of such a space. The first limitation concerns information that is poorly represented on the common space yet very significant for a retrieval task. The second limitation consists in a separation between modalities on the common space, which leads to coarse cross-modal matching. To deal with the first limitation concerning poorly-represented data, we put forward a model which first identifies such information and then finds ways to combine it with data that is relatively well-represented on the joint space. Evaluations on emph{text illustration} tasks show that by appropriately identifying and taking such information into account, the results of cross-modal retrieval can be strongly improved. The major work in this thesis aims to cope with the separation between modalities on the joint space to enhance the performance of cross-modal tasks.We propose two representation methods for bi-modal or uni-modal documents that aggregate information from both the visual and textual modalities projected on the joint space. Specifically, for uni-modal documents we suggest a completion process relying on an auxiliary dataset to find the corresponding information in the absent modality and then use such information to build a final bi-modal representation for a uni-modal document. Evaluations show that our approaches achieve state-of-the-art results on several standard and challenging datasets for cross-modal retrieval or bi-modal and cross-modal classification
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Yan-Fu, and 李彥甫. "The Cross-Modal Method of Tag Labeling in Music Information Retrieval." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/45038305568580924323.

Full text
Abstract:
碩士
輔仁大學
資訊工程學系
96
A music object contains multi-facet feature, such as the average frequency, speed, timbre, melody, rhythm, genre and so on. We conclude that these features are extracted from various feature domains, respectively. Moreover, these feature do- mains are separated into two types, the quantified and the unquantifiable. Within the quantified feature domain, the features are quantified as the numerical value, for example, if there are three important average frequencies in a music object, we quantify and denote as three numerical values: 20Hz, 80Hz and 100Hz in the feature domain: average frequency. On the other hand, the features in the unquan- tifiable feature domain are described as the non-numerical value (e.g. letters) and it is difficultly defined by the mathematic method. For example, the genre of the music object is difficultly extracted by the filter. However, among the features of a music object, the unquantifiable features are important for the human auditory system. Therefore, we introduce a cross-modal association method to associate the quantified and the unquantifiable features. We represent the music objects including quantified and unquantifiable features as the multimedia graph (MMG) [1] which converts the association problem to the graph problem, and apply the link analysis algorithm to rank the nodes in the graph. Thus we label the music object by the rank of nodes.
APA, Harvard, Vancouver, ISO, and other styles
7

Yang, Bo. "Semantic-aware data processing towards cross-modal multimedia analysis and content-based retrieval in distributed and mobile environments /." 2007. http://etda.libraries.psu.edu/theses/approved/WorldWideIndex/ETD-1850/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Ramanishka, Vasili. "Describing and retrieving visual content using natural language." Thesis, 2020. https://hdl.handle.net/2144/42026.

Full text
Abstract:
Modern deep learning methods have boosted research progress in visual recognition and text understanding but it is a non-trivial task to unite these advances from both disciplines. In this thesis, we develop models and techniques that allow us to connect natural language and visual content enabling automatic video subtitling, visual grounding, and text-based image search. Such models could be useful in a wide range of applications in robotics and human-computer interaction bridging the gap in vision and language understanding. First, we develop a model that generates natural language descriptions of the main activities and scenes depicted in short videos. While previous methods were constrained to a predefined list of objects, actions, or attributes, our model learns to generate descriptions directly from raw pixels. The model exploits available audio information and the video’s category (e.g., cooking, movie, education) to generate more relevant and coherent sentences. Then, we introduce a technique for visual grounding of generated sentences using the same video description model. Our approach allows for explaining the model’s prediction by localizing salient video regions for corresponding words in the generated sentence. Lastly, we address the problem of image retrieval. Existing cross-modal retrieval methods work by learning a common embedding space for different modalities using parallel data such as images and their accompanying descriptions. Instead, we focus on the case when images are connected by relative annotations: given the context set as an image and its metadata, the user can specify desired semantic changes using natural language instructions. The model needs to capture distinctive visual differences between image pairs as described by the user. Our approach enables interactive image search such that the natural language feedback significantly improves the efficacy of image retrieval. We show that the proposed methods advance the state-of-the-art for video captioning and image retrieval tasks in terms of both accuracy and interpretability.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Scalability in Cross-Modal Retrieval"

1

C, Peters, ed. Evaluation of multilingual and multi-modal information retrieval: 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain, September 20-22, 2006 ; revised selected papers. Berlin: Springer, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Gey, Fredric C., Paul Clough, Bernardo Magnini, Douglas W. Oard, and Jussi Karlgren. Evaluation of Multilingual and Multi-Modal Information Retrieval: 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain, September 20-22, 2006, Revised Selected Papers. Springer London, Limited, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Scalability in Cross-Modal Retrieval"

1

Zhu, Lei, Jingjing Li, and Weili Guan. "Cross-Modal Hashing." In Synthesis Lectures on Information Concepts, Retrieval, and Services, 45–89. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-37291-9_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Qing, and Yu Yang. "Cross-Modal Multimedia Information Retrieval." In Encyclopedia of Database Systems, 1–6. New York, NY: Springer New York, 2016. http://dx.doi.org/10.1007/978-1-4899-7993-3_90-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Li, Qing, and Yu Yang. "Cross-Modal Multimedia Information Retrieval." In Encyclopedia of Database Systems, 528–32. Boston, MA: Springer US, 2009. http://dx.doi.org/10.1007/978-0-387-39940-9_90.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wen, Zhenyu, and Aimin Feng. "Deep Centralized Cross-modal Retrieval." In MultiMedia Modeling, 443–55. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-67832-6_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Qing, and Yu Yang. "Cross-Modal Multimedia Information Retrieval." In Encyclopedia of Database Systems, 672–77. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_90.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ning, Xuecheng, Xiaoshan Yang, and Changsheng Xu. "Multi-hop Interactive Cross-Modal Retrieval." In MultiMedia Modeling, 681–93. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-37734-2_55.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Malik, Shaily, Nikhil Bhardwaj, Rahul Bhardwaj, and Saurabh Kumar. "Cross-Modal Retrieval Using Deep Learning." In Proceedings of Third Doctoral Symposium on Computational Intelligence, 725–34. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-3148-2_62.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Xuan, Ruisheng, Weihua Ou, Quan Zhou, Yongfeng Cao, Hua Yang, Xiangguang Xiong, and Fangming Ruan. "Semantics Consistent Adversarial Cross-Modal Retrieval." In Cognitive Internet of Things: Frameworks, Tools and Applications, 463–72. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-04946-1_45.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Mandal, Devraj, Yashas Annadani, and Soma Biswas. "GrowBit: Incremental Hashing for Cross-Modal Retrieval." In Computer Vision – ACCV 2018, 305–21. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-20870-7_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Owen, Charles B., and Fillia Makedon. "The Xtrieve Cross-Modal Information Retrieval System." In Computed Synchronization for Multimedia Applications, 149–68. Boston, MA: Springer US, 1999. http://dx.doi.org/10.1007/978-1-4757-4830-7_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Scalability in Cross-Modal Retrieval"

1

Wang, Bokun, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. "Adversarial Cross-Modal Retrieval." In MM '17: ACM Multimedia Conference. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3123266.3123326.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Jing, Longlong, Elahe Vahdani, Jiaxing Tan, and Yingli Tian. "Cross-Modal Center Loss for 3D Cross-Modal Retrieval." In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021. http://dx.doi.org/10.1109/cvpr46437.2021.00316.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhen, Liangli, Peng Hu, Xu Wang, and Dezhong Peng. "Deep Supervised Cross-Modal Retrieval." In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019. http://dx.doi.org/10.1109/cvpr.2019.01064.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ranjan, Viresh, Nikhil Rasiwasia, and C. V. Jawahar. "Multi-label Cross-Modal Retrieval." In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015. http://dx.doi.org/10.1109/iccv.2015.466.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zong, Linlin, Qiujie Xie, Jiahui Zhou, Peiran Wu, Xianchao Zhang, and Bo Xu. "FedCMR: Federated Cross-Modal Retrieval." In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3404835.3462989.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hou, Danyang, Liang Pang, Yanyan Lan, Huawei Shen, and Xueqi Cheng. "Region-based Cross-modal Retrieval." In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. http://dx.doi.org/10.1109/ijcnn55064.2022.9892139.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Xu, Xing, Fumin Shen, Yang Yang, and Heng Tao Shen. "Discriminant Cross-modal Hashing." In ICMR'16: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2911996.2912056.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gur, Shir, Natalia Neverova, Chris Stauffer, Ser-Nam Lim, Douwe Kiela, and Austin Reiter. "Cross-Modal Retrieval Augmentation for Multi-Modal Classification." In Findings of the Association for Computational Linguistics: EMNLP 2021. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021. http://dx.doi.org/10.18653/v1/2021.findings-emnlp.11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Fei, Hongliang, Tan Yu, and Ping Li. "Cross-lingual Cross-modal Pretraining for Multimodal Retrieval." In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021. http://dx.doi.org/10.18653/v1/2021.naacl-main.285.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Song, Ge, and Xiaoyang Tan. "Cross-modal Retrieval via Memory Network." In British Machine Vision Conference 2017. British Machine Vision Association, 2017. http://dx.doi.org/10.5244/c.31.178.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography