Academic literature on the topic 'Zero-shot Retrieval'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Zero-shot Retrieval.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Zero-shot Retrieval"

1

Dutta, Titir, and Soma Biswas. "Generalized Zero-Shot Cross-Modal Retrieval." IEEE Transactions on Image Processing 28, no. 12 (December 2019): 5953–62. http://dx.doi.org/10.1109/tip.2019.2923287.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Seo, Sanghyun, and Juntae Kim. "Hierarchical Semantic Loss and Confidence Estimator for Visual-Semantic Embedding-Based Zero-Shot Learning." Applied Sciences 9, no. 15 (August 2, 2019): 3133. http://dx.doi.org/10.3390/app9153133.

Full text
Abstract:
Traditional supervised learning is dependent on the label of the training data, so there is a limitation that the class label which is not included in the training data cannot be recognized properly. Therefore, zero-shot learning, which can recognize unseen-classes that are not used in training, is gaining research interest. One approach to zero-shot learning is to embed visual data such as images and rich semantic data related to text labels of visual data into a common vector space to perform zero-shot cross-modal retrieval on newly input unseen-class data. This paper proposes a hierarchical semantic loss and confidence estimator to more efficiently perform zero-shot learning on visual data. Hierarchical semantic loss improves learning efficiency by using hierarchical knowledge in selecting a negative sample of triplet loss, and the confidence estimator estimates the confidence score to determine whether it is seen-class or unseen-class. These methodologies improve the performance of zero-shot learning by adjusting distances from a semantic vector to visual vector when performing zero-shot cross-modal retrieval. Experimental results show that the proposed method can improve the performance of zero-shot learning in terms of hit@k accuracy.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Xiao, Craig Macdonald, and Iadh Ounis. "Improving zero-shot retrieval using dense external expansion." Information Processing & Management 59, no. 5 (September 2022): 103026. http://dx.doi.org/10.1016/j.ipm.2022.103026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kumar, Sanjeev. "Phase retrieval with physics informed zero-shot network." Optics Letters 46, no. 23 (November 29, 2021): 5942. http://dx.doi.org/10.1364/ol.433625.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Full text
Abstract:
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Haofeng, Yang Long, and Ling Shao. "Zero-shot Hashing with orthogonal projection for image retrieval." Pattern Recognition Letters 117 (January 2019): 201–9. http://dx.doi.org/10.1016/j.patrec.2018.04.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Zhaolong, Yuejie Zhang, Rui Feng, Tao Zhang, and Weiguo Fan. "Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12943–50. http://dx.doi.org/10.1609/aaai.v34i07.6993.

Full text
Abstract:
Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) has been proposed recently, putting the traditional Sketch-based Image Retrieval (SBIR) under the setting of zero-shot learning. Dealing with both the challenges in SBIR and zero-shot learning makes it become a more difficult task. Previous works mainly focus on utilizing one kind of information, i.e., the visual information or the semantic information. In this paper, we propose a SketchGCN model utilizing the graph convolution network, which simultaneously considers both the visual information and the semantic information. Thus, our model can effectively narrow the domain gap and transfer the knowledge. Furthermore, we generate the semantic information from the visual information using a Conditional Variational Autoencoder rather than only map them back from the visual space to the semantic space, which enhances the generalization ability of our model. Besides, feature loss, classification loss, and semantic loss are introduced to optimize our proposed SketchGCN model. Our model gets a good performance on the challenging Sketchy and TU-Berlin datasets.
APA, Harvard, Vancouver, ISO, and other styles
8

Yang, Fan, Zheng Wang, Jing Xiao, and Shin'ichi Satoh. "Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12589–96. http://dx.doi.org/10.1609/aaai.v34i07.6949.

Full text
Abstract:
Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal v.s. visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval
APA, Harvard, Vancouver, ISO, and other styles
9

Xu, Rui, Zongyan Han, Le Hui, Jianjun Qian, and Jin Xie. "Domain Disentangled Generative Adversarial Network for Zero-Shot Sketch-Based 3D Shape Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2902–10. http://dx.doi.org/10.1609/aaai.v36i3.20195.

Full text
Abstract:
Sketch-based 3D shape retrieval is a challenging task due to the large domain discrepancy between sketches and 3D shapes. Since existing methods are trained and evaluated on the same categories, they cannot effectively recognize the categories that have not been used during training. In this paper, we propose a novel domain disentangled generative adversarial network (DD-GAN) for zero-shot sketch-based 3D retrieval, which can retrieve the unseen categories that are not accessed during training. Specifically, we first generate domain-invariant features and domain-specific features by disentangling the learned features of sketches and 3D shapes, where the domain-invariant features are used to align with the corresponding word embeddings. Then, we develop a generative adversarial network that combines the domain-specific features of the seen categories with the aligned domain-invariant features to synthesize samples, where the synthesized samples of the unseen categories are generated by using the corresponding word embeddings. Finally, we use the synthesized samples of the unseen categories combined with the real samples of the seen categories to train the network for retrieval, so that the unseen categories can be recognized. In order to reduce the domain shift problem, we utilize unlabeled unseen samples to enhance the discrimination ability of the discriminator. With the discriminator distinguishing the generated samples from the unlabeled unseen samples, the generator can generate more realistic unseen samples. Extensive experiments on the SHREC'13 and SHREC'14 datasets show that our method significantly improves the retrieval performance of the unseen categories.
APA, Harvard, Vancouver, ISO, and other styles
10

Xu, Xing, Jialin Tian, Kaiyi Lin, Huimin Lu, Jie Shao, and Heng Tao Shen. "Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network." ACM Transactions on Multimedia Computing, Communications, and Applications 17, no. 1s (March 31, 2021): 1–17. http://dx.doi.org/10.1145/3424341.

Full text
Abstract:
Conventional cross-modal retrieval models mainly assume the same scope of the classes for both the training set and the testing set. This assumption limits their extensibility on zero-shot cross-modal retrieval (ZS-CMR), where the testing set consists of unseen classes that are disjoint with seen classes in the training set. The ZS-CMR task is more challenging due to the heterogeneous distributions of different modalities and the semantic inconsistency between seen and unseen classes. A few of recently proposed approaches are inspired by zero-shot learning to estimate the distribution underlying multimodal data by generative models and make the knowledge transfer from seen classes to unseen classes by leveraging class embeddings. However, directly borrowing the idea from zero-shot learning (ZSL) is not fully adaptive to the retrieval task, since the core of the retrieval task is learning the common space. To address the above issues, we propose a novel approach named Assembling AutoEncoder and Generative Adversarial Network (AAEGAN), which combines the strength of AutoEncoder (AE) and Generative Adversarial Network (GAN), to jointly incorporate common latent space learning, knowledge transfer, and feature synthesis for ZS-CMR. Besides, instead of utilizing class embeddings as common space, the AAEGAN approach maps all multimodal data into a learned latent space with the distribution alignment via three coupled AEs. We empirically show the remarkable improvement for ZS-CMR task and establish the state-of-the-art or competitive performance on four image-text retrieval datasets.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Zero-shot Retrieval"

1

Efes, Stergios. "Zero-shot, One Kill: BERT for Neural Information Retrieval." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444835.

Full text
Abstract:
[Background]: The advent of bidirectional encoder representation from trans- formers (BERT) language models (Devlin et al., 2018) and MS Marco, a large scale human-annotated dataset for machine reading comprehension (Bajaj et al., 2016) that made publicly available, led the field of information retrieval (IR) to experience a revolution (Lin et al., 2020). The retrieval model based on BERT of Nogueira and Cho (2019), by the time they published their paper, became the top entry in the MS Marco passage-reranking leaderboard, surpassing the previous state of the art by 27% in MRR@10. However, training such neural IR models for different domains than MS Marco is still hard because neural approaches often require a vast amount of training data to perform effectively, which is not always available. To address the problem of the shortage of labelled data a new line of research emerged, training neural models with weak supervision. In weak supervision, given an unlabelled dataset labels are generated automatically using an existing model and then a machine learning model is trained upon the artificial “weak“ data. In case of weak supervision for IR, the training dataset comes in the form of a tuple (query, passage). Dehghani et al. (2017) in their work used the AOL query logs (Pass et al., 2006), which is a set of millions of real web queries, and BM25 to retrieve the relevant passages for each of the user queries. A drawback with this approach is that it is hard to obtain query logs for every single different domain. [Objective]: This thesis proposes an intuitive approach for addressing the shortage of data in domains with limited or no data at all through transfer learning in the context of IR. We leverage Wikipedia’s structure for creating a Wikipedia-based generic IR training dataset for zero-shot neural models. [Method]: We create the “pseudo-queries“ by concatenating the titles of Wikipedia’s articles along with each of their title sections and we consider the associated section’s passage as the relevant passage of the pseudo-queries. All of our experiments are evaluated on a standard collection: MS Marco, which is a large scale web collection. For our zero-shot experiments, our proposed model, called “Wiki“, is a BERT model trained on the artificial Wikipedia-based dataset and the baseline is a default BERT model without any additional training. In our second line of experiments, we explore the benefits gained by pre-fine- tuning on the Wikipedia-based IR dataset and further fine-tuning on in-domain data. Our proposed model, "Wiki+Ma", is a BERT model pre-fine-tuned in the Wikipedia-based dataset and further fine-tuned in MS Marco, while the baseline is a BERT model fine-tuned only in MS Marco. [Results]: Results regarding our first experiments show that our BERT model trained on the Wikipedia-based IR dataset, called "Wiki", achieves a performance of 0.197 in MRR@10, which is about +10 points more in comparison to a BERT model with default weights; in addition, results in the development set indicate that the “Wiki“ model performs better than BERT model trained on in-domain data when the data is between 10k-50k instances. Results regarding our second line of experiments show that pre-fine-tuning on the Wikipedia-based IR dataset benefits later fine-tuning steps on in-domain data in terms of stability. [Conclusion]: Our findings suggest that transfer learning for IR tasks by leveraging the generic knowledge incorporated in Wikipedia is possible, though more experimentation is needed to understand its limitations in comparison with the traditional approaches such as the BM25.
APA, Harvard, Vancouver, ISO, and other styles
2

Bucher, Maxime. "Apprentissage et exploitation de représentations sémantiques pour la classification et la recherche d'images." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMC250/document.

Full text
Abstract:
Dans cette thèse nous étudions différentes questions relatives à la mise en pratique de modèles d'apprentissage profond. En effet malgré les avancées prometteuses de ces algorithmes en vision par ordinateur, leur emploi dans certains cas d'usage réels reste difficile. Une première difficulté est, pour des tâches de classification d'images, de rassembler pour des milliers de catégories suffisamment de données d'entraînement pour chacune des classes. C'est pourquoi nous proposons deux nouvelles approches adaptées à ce scénario d'apprentissage, appelé <>.L'utilisation d'information sémantique pour modéliser les classes permet de définir les modèles par description, par opposition à une modélisation à partir d'un ensemble d'exemples, et rend possible la modélisation sans donnée de référence. L'idée fondamentale du premier chapitre est d'obtenir une distribution d'attributs optimale grâce à l'apprentissage d'une métrique, capable à la fois de sélectionner et de transformer la distribution des données originales. Dans le chapitre suivant, contrairement aux approches standards de la littérature qui reposent sur l'apprentissage d'un espace d'intégration commun, nous proposons de générer des caractéristiques visuelles à partir d'un générateur conditionnel. Une fois générés ces exemples artificiels peuvent être utilisés conjointement avec des données réelles pour l'apprentissage d'un classifieur discriminant. Dans une seconde partie de ce manuscrit, nous abordons la question de l'intelligibilité des calculs pour les tâches de vision par ordinateur. En raison des nombreuses et complexes transformations des algorithmes profonds, il est difficile pour un utilisateur d'interpréter le résultat retourné. Notre proposition est d'introduire un <> dans le processus de traitement. La représentation de l'image est exprimée entièrement en langage naturel, tout en conservant l'efficacité des représentations numériques. L'intelligibilité de la représentation permet à un utilisateur d'examiner sur quelle base l'inférence a été réalisée et ainsi d'accepter ou de rejeter la décision suivant sa connaissance et son expérience humaine
In this thesis, we examine some practical difficulties of deep learning models.Indeed, despite the promising results in computer vision, implementing them in some situations raises some questions. For example, in classification tasks where thousands of categories have to be recognised, it is sometimes difficult to gather enough training data for each category.We propose two new approaches for this learning scenario, called <>. We use semantic information to model classes which allows us to define models by description, as opposed to modelling from a set of examples.In the first chapter we propose to optimize a metric in order to transform the distribution of the original data and to obtain an optimal attribute distribution. In the following chapter, unlike the standard approaches of the literature that rely on the learning of a common integration space, we propose to generate visual features from a conditional generator. The artificial examples can be used in addition to real data for learning a discriminant classifier. In the second part of this thesis, we address the question of computational intelligibility for computer vision tasks. Due to the many and complex transformations of deep learning algorithms, it is difficult for a user to interpret the returned prediction. Our proposition is to introduce what we call a <> in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language, while retaining the efficiency of numerical representations. This semantic bottleneck allows to detect failure cases in the prediction process so as to accept or reject the decision
APA, Harvard, Vancouver, ISO, and other styles
3

Mensink, Thomas. "Learning Image Classification and Retrieval Models." Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENM113/document.

Full text
Abstract:
Nous assistons actuellement à une explosion de la quantité des données visuelles. Par exemple, plusieurs millions de photos sont partagées quotidiennement sur les réseaux sociaux. Les méthodes d'interprétation d'images vise à faciliter l'accès à ces données visuelles, d'une manière sémantiquement compréhensible. Dans ce manuscrit, nous définissons certains buts détaillés qui sont intéressants pour les taches d'interprétation d'images, telles que la classification ou la recherche d'images, que nous considérons dans les trois chapitres principaux. Tout d'abord, nous visons l'exploitation de la nature multimodale de nombreuses bases de données, pour lesquelles les documents sont composés d'images et de descriptions textuelles. Dans ce but, nous définissons des similarités entre le contenu visuel d'un document, et la description textuelle d'un autre document. Ces similarités sont calculées en deux étapes, tout d'abord nous trouvons les voisins visuellement similaires dans la base multimodale, puis nous utilisons les descriptions textuelles de ces voisins afin de définir une similarité avec la description textuelle de n'importe quel document. Ensuite, nous présentons une série de modèles structurés pour la classification d'images, qui encodent explicitement les interactions binaires entre les étiquettes (ou labels). Ces modèles sont plus expressifs que des prédicateurs d'étiquette indépendants, et aboutissent à des prédictions plus fiables, en particulier dans un scenario de prédiction interactive, où les utilisateurs fournissent les valeurs de certaines des étiquettes d'images. Un scenario interactif comme celui-ci offre un compromis intéressant entre la précision, et l'effort d'annotation manuelle requis. Nous explorons les modèles structurés pour la classification multi-étiquette d'images, pour la classification d'image basée sur les attributs, et pour l'optimisation de certaines mesures de rang spécifiques. Enfin, nous explorons les classifieurs par k plus proches voisins, et les classifieurs par plus proche moyenne, pour la classification d'images à grande échelle. Nous proposons des méthodes d'apprentissage de métrique efficaces pour améliorer les performances de classification, et appliquons ces méthodes à une base de plus d'un million d'images d'apprentissage, et d'un millier de classes. Comme les deux méthodes de classification permettent d'incorporer des classes non vues pendant l'apprentissage à un coût presque nul, nous avons également étudié leur performance pour la généralisation. Nous montrons que la classification par plus proche moyenne généralise à partir d'un millier de classes, sur dix mille classes à un coût négligeable, et les performances obtenus sont comparables à l'état de l'art
We are currently experiencing an exceptional growth of visual data, for example, millions of photos are shared daily on social-networks. Image understanding methods aim to facilitate access to this visual data in a semantically meaningful manner. In this dissertation, we define several detailed goals which are of interest for the image understanding tasks of image classification and retrieval, which we address in three main chapters. First, we aim to exploit the multi-modal nature of many databases, wherein documents consists of images with a form of textual description. In order to do so we define similarities between the visual content of one document and the textual description of another document. These similarities are computed in two steps, first we find the visually similar neighbors in the multi-modal database, and then use the textual descriptions of these neighbors to define a similarity to the textual description of any document. Second, we introduce a series of structured image classification models, which explicitly encode pairwise label interactions. These models are more expressive than independent label predictors, and lead to more accurate predictions. Especially in an interactive prediction scenario where a user provides the value of some of the image labels. Such an interactive scenario offers an interesting trade-off between accuracy and manual labeling effort. We explore structured models for multi-label image classification, for attribute-based image classification, and for optimizing for specific ranking measures. Finally, we explore k-nearest neighbors and nearest-class mean classifiers for large-scale image classification. We propose efficient metric learning methods to improve classification performance, and use these methods to learn on a data set of more than one million training images from one thousand classes. Since both classification methods allow for the incorporation of classes not seen during training at near-zero cost, we study their generalization performances. We show that the nearest-class mean classification method can generalize from one thousand to ten thousand classes at negligible cost, and still perform competitively with the state-of-the-art
APA, Harvard, Vancouver, ISO, and other styles
4

Dutta, Titir. "Generalizing Cross-domain Retrieval Algorithms." Thesis, 2021. https://etd.iisc.ac.in/handle/2005/5869.

Full text
Abstract:
Cross-domain retrieval is an important research topic due to its wide range of applications in e-commerce, forensics etc. It addresses the data retrieval problem from a search set, when the query belongs to one domain, and the search database contains samples from some other domain. Several algorithms have been proposed for the same in recent literature to address this task. In this thesis, we address some of the challenges in cross-domain retrieval, specifically for the application of sketch-based image retrieval. Traditionally, cross-domain algorithms assume that both the training and test data belong to the same set of seen-classes, which is quite restrictive. Thus, such models can only be used to retrieve data from the two specific domains on which they have been trained on, and cannot generalize to new domains or new classes, during retrieval. But in real world, new object classes are continuously being discovered over time, thus it is necessary to design algorithms that can generalize to previously unseen classes. In addition, for a practically useful retrieval model, it will be good if the model can perform retrieval between any two different data domains, whether or not those domains are used for training. In our work, we observe a significant decrease in the performance of existing approaches in these generalized retrieval scenarios, when such simplified assumptions are removed. In this thesis, we aim to address these and related challenges, so as to make the cross-domain retrieval models better suited for real-life applications. We first consider a class-wise generalized protocol, where the query data during retrieval may belong to any unseen classes. Following the nomenclature in the classification problems, we refer to this as zero-shot cross-modal retrieval and propose an add-on ranking module to improve the performance of the existing cross-modal methods in literature. This work is applicable to different modalities (eg. text-image), in addition to different domains (eg. image and RGBD data). Next, we focus on developing an end-to-end framework, named StyleGuide, which addresses the task of sketch-based image retrieval, for such zero-shot retrieval condition. In addition, this thesis also explores the effects of class-imbalance in training data, which is a challenging aspect for designing any machine learning algorithm. The problem of data imbalance is inherently present in all real-world datasets and we show that it adversely affects the performance of existing sketch-based image retrieval approaches. A robust adaptive margin- based regularizer is proposed as a potential solution to handle this challenge. Also, a style- augmented SBIR system is proposed in this thesis, as an extended use-case for SBIR-problems. Finally, we introduce a novel protocol termed as Universal cross-domain retrieval (UCDR), which is an extension of the zero-shot cross-modal retrieval across generalized query domains. Here, the query may belong to an unseen domain, as well as an unseen class, thus further generalizing the retrieval model. A mix-up based class-neighbourhood aware network SnMpNet is proposed to address the same. Finally, we conclude the thesis summarizing all the research findings and discussing the future research directions.
APA, Harvard, Vancouver, ISO, and other styles
5

"Video2Vec: Learning Semantic Spatio-Temporal Embedding for Video Representations." Master's thesis, 2016. http://hdl.handle.net/2286/R.I.40765.

Full text
Abstract:
abstract: High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos. Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail when dealing with high-level tasks or complex scene videos. Due to the success of using deep convolution neural networks (CNNs) to extract global representations for static images, researchers have been using similar techniques to tackle video contents. Typical techniques first extract spatial features by processing raw images using deep convolution architectures designed for static image classifications. Then simple average, concatenation or classifier-based fusion/pooling methods are applied to the extracted features. I argue that features extracted in such ways do not acquire enough representative information since videos, unlike images, should be characterized as a temporal sequence of semantically coherent visual contents and thus need to be represented in a manner considering both semantic and spatio-temporal information. In this thesis, I propose a novel architecture to learn semantic spatio-temporal embedding for videos to support high-level video analysis. The proposed method encodes video spatial and temporal information separately by employing a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Fully Connected Gated Recurrent Unit (FC-GRU) encoders for capturing longer-term temporal structure of the CNN features. The resultant spatio-temporal representation (a vector) is used to learn a mapping via a Fully Connected Multilayer Perceptron (FC-MLP) to the word2vec semantic embedding space, leading to a semantic interpretation of the video vector that supports high-level analysis. I evaluate the usefulness and effectiveness of this new video representation by conducting experiments on action recognition, zero-shot video classification, and semantic video retrieval (word-to-video) retrieval, using the UCF101 action recognition dataset.
Dissertation/Thesis
Masters Thesis Computer Science 2016
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Zero-shot Retrieval"

1

Fröbe, Maik, Christopher Akiki, Martin Potthast, and Matthias Hagen. "How Train–Test Leakage Affects Zero-Shot Retrieval." In String Processing and Information Retrieval, 147–61. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-20643-6_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chi, Jingze, Xin Huang, and Yuxin Peng. "Zero-Shot Cross-Media Retrieval with External Knowledge." In Communications in Computer and Information Science, 200–211. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-8530-7_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Yelamarthi, Sasi Kiran, Shiva Krishna Reddy, Ashish Mishra, and Anurag Mittal. "A Zero-Shot Framework for Sketch Based Image Retrieval." In Computer Vision – ECCV 2018, 316–33. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01225-0_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Chuang, Lunke Fei, Peipei Kang, Jiahao Liang, Xiaozhao Fang, and Shaohua Teng. "Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval." In Lecture Notes in Computer Science, 459–72. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20865-2_34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Donglin, Xiao-Jun Wu, and Jun Yu. "Discrete Bidirectional Matrix Factorization Hashing for Zero-Shot Cross-Media Retrieval." In Pattern Recognition and Computer Vision, 524–36. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-88007-1_43.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Mingkang, and Yonggang Qi. "XPNet: Cross-Domain Prototypical Network for Zero-Shot Sketch-Based Image Retrieval." In Pattern Recognition and Computer Vision, 394–410. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-18907-4_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Tao, Mingyang Zhang, Jing Lu, Michael Bendersky, and Marc Najork. "Out-of-Domain Semantics to the Rescue! Zero-Shot Hybrid Retrieval Models." In Lecture Notes in Computer Science, 95–110. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-99736-6_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Zan, Daoguang, Sirui Wang, Hongzhi Zhang, Yuanmeng Yan, Wei Wu, Bei Guan, and Yongji Wang. "S$$^2$$QL: Retrieval Augmented Zero-Shot Question Answering over Knowledge Graph." In Advances in Knowledge Discovery and Data Mining, 223–36. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05981-0_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

MacAvaney, Sean, Luca Soldaini, and Nazli Goharian. "Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning." In Lecture Notes in Computer Science, 246–54. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-45442-5_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Glavaš, Goran, and Ivan Vulić. "Zero-Shot Language Transfer for Cross-Lingual Sentence Retrieval Using Bidirectional Attention Model." In Lecture Notes in Computer Science, 523–38. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-15712-8_34.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Zero-shot Retrieval"

1

Chi, Jingze, and Yuxin Peng. "Dual Adversarial Networks for Zero-shot Cross-media Retrieval." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/92.

Full text
Abstract:
Existing cross-media retrieval methods usually require that testing categories remain the same with training categories, which cannot support the retrieval of increasing new categories. Inspired by zero-shot learning, this paper proposes zeroshot cross-media retrieval for addressing the above problem, which aims to retrieve data of new categories across different media types. It is challenging that zero-shot cross-media retrieval has to handle not only the inconsistent semantics across new and known categories, but also the heterogeneous distributions across different media types. To address the above challenges, this paper proposes Dual Adversarial Networks for Zero-shot Crossmedia Retrieval (DANZCR), which is the first approach to address zero-shot cross-media retrieval to the best of our knowledge. Our DANZCR approach consists of two GANs in a dual structure for common representation generation and original representation reconstruction respectively, which capture the underlying data structures as well as strengthen relations between input data and semantic space to generalize across seen and unseen categories. Our DANZCR approach exploits word embeddings to learn common representations in semantic space via an adversarial learning method, which preserves the inherent cross-media correlation and enhances the knowledge transfer to new categories. Experiments on three widely-used cross-media retrieval datasets show the effectiveness of our approach.
APA, Harvard, Vancouver, ISO, and other styles
2

Honbu, Yuma, and Keiji Yanai. "Few-Shot and Zero-Shot Semantic Segmentation for Food Images." In ICMR '21: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3463947.3469234.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Huang, Siteng, Qiyao Wei, and Donglin Wang. "Reference-Limited Compositional Zero-Shot Learning." In ICMR '23: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3591106.3592225.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Zhipeng, Hao Wang, Jiexi Yan, Aming Wu, and Cheng Deng. "Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/158.

Full text
Abstract:
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task, where abstract sketches are used as queries to retrieve natural images under zero-shot scenario. Most existing methods regard ZS-SBIR as a traditional classification problem and employ a cross-entropy or triplet-based loss to achieve retrieval, which neglect the problems of the domain gap between sketches and natural images and the large intra-class diversity in sketches. Toward this end, we propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR. Specifically, a cross-modal contrastive method is proposed to learn generalized representations to smooth the domain gap by mining relations with additional augmented samples. Furthermore, a category-specific memory bank with sketch features is explored to reduce intra-class diversity in the sketch domain. Extensive experiments demonstrate that our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
APA, Harvard, Vancouver, ISO, and other styles
5

Xu, Yahui, Yang Yang, Fumin Shen, Xing Xu, Yuxuan Zhou, and Heng Tao Shen. "Attribute hashing for zero-shot image retrieval." In 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2017. http://dx.doi.org/10.1109/icme.2017.8019425.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Guolong, Xun Wu, Zhaoyuan Liu, and Junchi Yan. "Prompt-based Zero-shot Video Moment Retrieval." In MM '22: The 30th ACM International Conference on Multimedia. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3503161.3548004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Xu, Canwen, Daya Guo, Nan Duan, and Julian McAuley. "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval." In Findings of the Association for Computational Linguistics: ACL 2022. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.findings-acl.281.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Xu, Xing, Fumin Shen, Yang Yang, Jie Shao, and Zi Huang. "Transductive Visual-Semantic Embedding for Zero-shot Learning." In ICMR '17: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3078971.3078977.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Gao, LianLi, Jingkuan Song, Junming Shao, Xiaofeng Zhu, and HengTao Shen. "Zero-shot Image Categorization by Image Correlation Exploration." In ICMR '15: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2671188.2749309.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Sharma, Prawaal, and Navneet Goyal. "Zero-shot reductive paraphrasing for digitally semi-literate." In FIRE 2021: Forum for Information Retrieval Evaluation. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3503162.3503171.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography