To see the other types of publications on this topic, follow the link: Zero-shot Retrieval.

Journal articles on the topic 'Zero-shot Retrieval'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Zero-shot Retrieval.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Dutta, Titir, and Soma Biswas. "Generalized Zero-Shot Cross-Modal Retrieval." IEEE Transactions on Image Processing 28, no. 12 (December 2019): 5953–62. http://dx.doi.org/10.1109/tip.2019.2923287.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Seo, Sanghyun, and Juntae Kim. "Hierarchical Semantic Loss and Confidence Estimator for Visual-Semantic Embedding-Based Zero-Shot Learning." Applied Sciences 9, no. 15 (August 2, 2019): 3133. http://dx.doi.org/10.3390/app9153133.

Full text
Abstract:
Traditional supervised learning is dependent on the label of the training data, so there is a limitation that the class label which is not included in the training data cannot be recognized properly. Therefore, zero-shot learning, which can recognize unseen-classes that are not used in training, is gaining research interest. One approach to zero-shot learning is to embed visual data such as images and rich semantic data related to text labels of visual data into a common vector space to perform zero-shot cross-modal retrieval on newly input unseen-class data. This paper proposes a hierarchical semantic loss and confidence estimator to more efficiently perform zero-shot learning on visual data. Hierarchical semantic loss improves learning efficiency by using hierarchical knowledge in selecting a negative sample of triplet loss, and the confidence estimator estimates the confidence score to determine whether it is seen-class or unseen-class. These methodologies improve the performance of zero-shot learning by adjusting distances from a semantic vector to visual vector when performing zero-shot cross-modal retrieval. Experimental results show that the proposed method can improve the performance of zero-shot learning in terms of hit@k accuracy.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Xiao, Craig Macdonald, and Iadh Ounis. "Improving zero-shot retrieval using dense external expansion." Information Processing & Management 59, no. 5 (September 2022): 103026. http://dx.doi.org/10.1016/j.ipm.2022.103026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kumar, Sanjeev. "Phase retrieval with physics informed zero-shot network." Optics Letters 46, no. 23 (November 29, 2021): 5942. http://dx.doi.org/10.1364/ol.433625.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Full text
Abstract:
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Haofeng, Yang Long, and Ling Shao. "Zero-shot Hashing with orthogonal projection for image retrieval." Pattern Recognition Letters 117 (January 2019): 201–9. http://dx.doi.org/10.1016/j.patrec.2018.04.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Zhaolong, Yuejie Zhang, Rui Feng, Tao Zhang, and Weiguo Fan. "Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12943–50. http://dx.doi.org/10.1609/aaai.v34i07.6993.

Full text
Abstract:
Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) has been proposed recently, putting the traditional Sketch-based Image Retrieval (SBIR) under the setting of zero-shot learning. Dealing with both the challenges in SBIR and zero-shot learning makes it become a more difficult task. Previous works mainly focus on utilizing one kind of information, i.e., the visual information or the semantic information. In this paper, we propose a SketchGCN model utilizing the graph convolution network, which simultaneously considers both the visual information and the semantic information. Thus, our model can effectively narrow the domain gap and transfer the knowledge. Furthermore, we generate the semantic information from the visual information using a Conditional Variational Autoencoder rather than only map them back from the visual space to the semantic space, which enhances the generalization ability of our model. Besides, feature loss, classification loss, and semantic loss are introduced to optimize our proposed SketchGCN model. Our model gets a good performance on the challenging Sketchy and TU-Berlin datasets.
APA, Harvard, Vancouver, ISO, and other styles
8

Yang, Fan, Zheng Wang, Jing Xiao, and Shin'ichi Satoh. "Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12589–96. http://dx.doi.org/10.1609/aaai.v34i07.6949.

Full text
Abstract:
Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal v.s. visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval
APA, Harvard, Vancouver, ISO, and other styles
9

Xu, Rui, Zongyan Han, Le Hui, Jianjun Qian, and Jin Xie. "Domain Disentangled Generative Adversarial Network for Zero-Shot Sketch-Based 3D Shape Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2902–10. http://dx.doi.org/10.1609/aaai.v36i3.20195.

Full text
Abstract:
Sketch-based 3D shape retrieval is a challenging task due to the large domain discrepancy between sketches and 3D shapes. Since existing methods are trained and evaluated on the same categories, they cannot effectively recognize the categories that have not been used during training. In this paper, we propose a novel domain disentangled generative adversarial network (DD-GAN) for zero-shot sketch-based 3D retrieval, which can retrieve the unseen categories that are not accessed during training. Specifically, we first generate domain-invariant features and domain-specific features by disentangling the learned features of sketches and 3D shapes, where the domain-invariant features are used to align with the corresponding word embeddings. Then, we develop a generative adversarial network that combines the domain-specific features of the seen categories with the aligned domain-invariant features to synthesize samples, where the synthesized samples of the unseen categories are generated by using the corresponding word embeddings. Finally, we use the synthesized samples of the unseen categories combined with the real samples of the seen categories to train the network for retrieval, so that the unseen categories can be recognized. In order to reduce the domain shift problem, we utilize unlabeled unseen samples to enhance the discrimination ability of the discriminator. With the discriminator distinguishing the generated samples from the unlabeled unseen samples, the generator can generate more realistic unseen samples. Extensive experiments on the SHREC'13 and SHREC'14 datasets show that our method significantly improves the retrieval performance of the unseen categories.
APA, Harvard, Vancouver, ISO, and other styles
10

Xu, Xing, Jialin Tian, Kaiyi Lin, Huimin Lu, Jie Shao, and Heng Tao Shen. "Zero-shot Cross-modal Retrieval by Assembling AutoEncoder and Generative Adversarial Network." ACM Transactions on Multimedia Computing, Communications, and Applications 17, no. 1s (March 31, 2021): 1–17. http://dx.doi.org/10.1145/3424341.

Full text
Abstract:
Conventional cross-modal retrieval models mainly assume the same scope of the classes for both the training set and the testing set. This assumption limits their extensibility on zero-shot cross-modal retrieval (ZS-CMR), where the testing set consists of unseen classes that are disjoint with seen classes in the training set. The ZS-CMR task is more challenging due to the heterogeneous distributions of different modalities and the semantic inconsistency between seen and unseen classes. A few of recently proposed approaches are inspired by zero-shot learning to estimate the distribution underlying multimodal data by generative models and make the knowledge transfer from seen classes to unseen classes by leveraging class embeddings. However, directly borrowing the idea from zero-shot learning (ZSL) is not fully adaptive to the retrieval task, since the core of the retrieval task is learning the common space. To address the above issues, we propose a novel approach named Assembling AutoEncoder and Generative Adversarial Network (AAEGAN), which combines the strength of AutoEncoder (AE) and Generative Adversarial Network (GAN), to jointly incorporate common latent space learning, knowledge transfer, and feature synthesis for ZS-CMR. Besides, instead of utilizing class embeddings as common space, the AAEGAN approach maps all multimodal data into a learned latent space with the distribution alignment via three coupled AEs. We empirically show the remarkable improvement for ZS-CMR task and establish the state-of-the-art or competitive performance on four image-text retrieval datasets.
APA, Harvard, Vancouver, ISO, and other styles
11

Tursun, Osman, Simon Denman, Sridha Sridharan, Ethan Goan, and Clinton Fookes. "An efficient framework for zero-shot sketch-based image retrieval." Pattern Recognition 126 (June 2022): 108528. http://dx.doi.org/10.1016/j.patcog.2022.108528.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Ge, Ce, Jingyu Wang, Qi Qi, Haifeng Sun, Tong Xu, and Jianxin Liao. "Semi-transductive Learning for Generalized Zero-Shot Sketch-Based Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 7678–86. http://dx.doi.org/10.1609/aaai.v37i6.25931.

Full text
Abstract:
Sketch-based image retrieval (SBIR) is an attractive research area where freehand sketches are used as queries to retrieve relevant images. Existing solutions have advanced the task to the challenging zero-shot setting (ZS-SBIR), where the trained models are tested on new classes without seen data. However, they are prone to overfitting under a realistic scenario when the test data includes both seen and unseen classes. In this paper, we study generalized ZS-SBIR (GZS-SBIR) and propose a novel semi-transductive learning paradigm. Transductive learning is performed on the image modality to explore the potential data distribution within unseen classes, and zero-shot learning is performed on the sketch modality sharing the learned knowledge through a semi-heterogeneous architecture. A hybrid metric learning strategy is proposed to establish semantics-aware ranking property and calibrate the joint embedding space. Extensive experiments are conducted on two large-scale benchmarks and four evaluation metrics. The results show that our method is superior over the state-of-the-art competitors in the challenging GZS-SBIR task.
APA, Harvard, Vancouver, ISO, and other styles
13

Liu, Huixia, and Zhihong Qin. "Deep quantization network with visual-semantic alignment for zero-shot image retrieval." Electronic Research Archive 31, no. 7 (2023): 4232–47. http://dx.doi.org/10.3934/era.2023215.

Full text
Abstract:
<abstract><p>Approximate nearest neighbor (ANN) search has become an essential paradigm for large-scale image retrieval. Conventional ANN search requires the categories of query images to been seen in the training set. However, facing the rapid evolution of newly-emerging concepts on the web, it is too expensive to retrain the model via collecting labeled data with the new (unseen) concepts. Existing zero-shot hashing methods choose the semantic space or intermediate space as the embedding space, which ignore the inconsistency of visual space and semantic space and suffer from the hubness problem on the zero-shot image retrieval task. In this paper, we present an novel deep quantization network with visual-semantic alignment for efficient zero-shot image retrieval. Specifically, we adopt a multi-task architecture that is capable of $ 1) $ learning discriminative and polymeric image representations for facilitating the visual-semantic alignment; $ 2) $ learning discriminative semantic embeddings for knowledge transfer; and $ 3) $ learning compact binary codes for aligning the visual space and the semantic space. We compare the proposed method with several state-of-the-art methods on several benchmark datasets, and the experimental results validate the superiority of the proposed method.</p></abstract>
APA, Harvard, Vancouver, ISO, and other styles
14

Dutta, Anjan, and Zeynep Akata. "Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-Based Image Retrieval." International Journal of Computer Vision 128, no. 10-11 (July 29, 2020): 2684–703. http://dx.doi.org/10.1007/s11263-020-01350-x.

Full text
Abstract:
Abstract Low-shot sketch-based image retrieval is an emerging task in computer vision, allowing to retrieve natural images relevant to hand-drawn sketch queries that are rarely seen during the training phase. Related prior works either require aligned sketch-image pairs that are costly to obtain or inefficient memory fusion layer for mapping the visual information to a semantic space. In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks, where we introduce the few-shot setting for SBIR. For solving these tasks, we propose a semantically aligned paired cycle-consistent generative adversarial network (SEM-PCYC) for any-shot SBIR, where each branch of the generative adversarial network maps the visual information from sketch and image to a common semantic space via adversarial training. Each of these branches maintains cycle consistency that only requires supervision at the category level, and avoids the need of aligned sketch-image pairs. A classification criteria on the generators’ outputs ensures the visual to semantic space mapping to be class-specific. Furthermore, we propose to combine textual and hierarchical side information via an auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in any-shot SBIR performance over the state-of-the-art on the extended version of the challenging Sketchy, TU-Berlin and QuickDraw datasets.
APA, Harvard, Vancouver, ISO, and other styles
15

Geigle, Gregor, Jonas Pfeiffer, Nils Reimers, Ivan Vulić, and Iryna Gurevych. "Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval." Transactions of the Association for Computational Linguistics 10 (2022): 503–21. http://dx.doi.org/10.1162/tacl_a_00473.

Full text
Abstract:
Abstract Current state-of-the-art approaches to cross- modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. While offering unmatched retrieval performance, such models: 1) are typically pretrained from scratch and thus less scalable, 2) suffer from huge retrieval latency and inefficiency issues, which makes them impractical in realistic applications. To address these crucial gaps towards both improved and efficient cross- modal retrieval, we propose a novel fine-tuning framework that turns any pretrained text-image multi-modal model into an efficient retrieval model. The framework is based on a cooperative retrieve-and-rerank approach that combines: 1) twin networks (i.e., a bi-encoder) to separately encode all items of a corpus, enabling efficient initial retrieval, and 2) a cross-encoder component for a more nuanced (i.e., smarter) ranking of the retrieved small set of items. We also propose to jointly fine- tune the two components with shared weights, yielding a more parameter-efficient model. Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross- encoders.1
APA, Harvard, Vancouver, ISO, and other styles
16

Li, Jiangtong, Zhixin Ling, Li Niu, and Liqing Zhang. "Zero-shot sketch-based image retrieval with structure-aware asymmetric disentanglement." Computer Vision and Image Understanding 218 (April 2022): 103412. http://dx.doi.org/10.1016/j.cviu.2022.103412.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Yuan, Xu, Guangze Wang, Zhikui Chen, and Fangming Zhong. "CHOP: An orthogonal hashing method for zero-shot cross-modal retrieval." Pattern Recognition Letters 145 (May 2021): 247–53. http://dx.doi.org/10.1016/j.patrec.2021.02.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Wang, Bingrui, and Yuan Zhou. "Doodle to Object: Practical Zero-Shot Sketch-Based 3D Shape Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (June 26, 2023): 2474–82. http://dx.doi.org/10.1609/aaai.v37i2.25344.

Full text
Abstract:
Zero-shot (ZS) sketch-based three-dimensional (3D) shape retrieval (SBSR) is challenging due to the abstraction of sketches, cross-domain discrepancies between two-dimensional sketches and 3D shapes, and ZS-driven semantic knowledge transference from seen to unseen categories. Extant SBSR datasets suffer from lack of data, and no current SBSR methods consider ZS scenarios. In this paper, we contribute a new Doodle2Object (D2O) dataset consisting of 8,992 3D shapes and over 7M sketches spanning 50 categories. Then, we propose a novel prototype contrastive learning (PCL) method that effectively extracts features from different domains and adapts them to unseen categories. Specifically, our PCL method combines the ideas of contrastive and cluster-based prototype learning, and several randomly selected prototypes of different classes are assigned to each sample. By comparing these prototypes, a given sample can be moved closer to the same semantic class of samples while moving away from negative ones. Extensive experiments on two common SBSR benchmarks and our D2O dataset demonstrate the efficacy of the proposed PCL method for ZS-SBSR. Resource is available at https://github.com/yigohw/doodle2object.
APA, Harvard, Vancouver, ISO, and other styles
19

Wei, Kun, Cheng Deng, Xu Yang, and Maosen Li. "Incremental Embedding Learning via Zero-Shot Translation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (May 18, 2021): 10254–62. http://dx.doi.org/10.1609/aaai.v35i11.17229.

Full text
Abstract:
Modern deep learning methods have achieved great success in machine learning and computer vision fields by learning a set of pre-defined datasets. Howerver, these methods perform unsatisfactorily when applied into real-world situations. The reason of this phenomenon is that learning new tasks leads the trained model quickly forget the knowledge of old tasks, which is referred to as catastrophic forgetting. Current state-of-the-art incremental learning methods tackle catastrophic forgetting problem in traditional classification networks and ignore the problem existing in embedding networks, which are the basic networks for image retrieval, face recognition, zero-shot learning, etc. Different from traditional incremental classification networks, the semantic gap between the embedding spaces of two adjacent tasks is the main challenge for embedding networks under incremental learning setting. Thus, we propose a novel class-incremental method for embedding network, named as zero-shot translation class-incremental method (ZSTCI), which leverages zero-shot translation to estimate and compensate the semantic gap without any exemplars. Then, we try to learn a unified representation for two adjacent tasks in sequential learning process, which captures the relationships of previous classes and current classes precisely. In addition, ZSTCI can easily be combined with existing regularization-based incremental learning methods to further improve performance of embedding networks. We conduct extensive experiments on CUB-200-2011 and CIFAR100, and the experiment results prove the effectiveness of our method. The code of our method has been released in https://github.com/Drkun/ZSTCI.
APA, Harvard, Vancouver, ISO, and other styles
20

Huang, Runhui, Yanxin Long, Jianhua Han, Hang Xu, Xiwen Liang, Chunjing Xu, and Xiaodan Liang. "NLIP: Noise-Robust Language-Image Pre-training." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 926–34. http://dx.doi.org/10.1609/aaai.v37i1.25172.

Full text
Abstract:
Large-scale cross-modal pre-training paradigms have recently shown ubiquitous success on a wide range of downstream tasks, e.g., zero-shot classification, retrieval and image captioning. However, their successes highly rely on the scale and quality of web-crawled data that naturally contain much incomplete and noisy information (e.g., wrong or irrelevant contents). Existing works either design manual rules to clean data or generate pseudo-targets as auxiliary signals for reducing noise impact, which do not explicitly tackle both the incorrect and incomplete challenges at the same time. In this paper, to automatically mitigate the impact of noise by solely mining over existing data, we propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion. First, in noise-harmonization scheme, NLIP estimates the noise probability of each pair according to the memorization effect of cross-modal transformers, then adopts noise-adaptive regularization to harmonize the cross-modal alignments with varying degrees. Second, in noise-completion scheme, to enrich the missing object information of text, NLIP injects a concept-conditioned cross-modal decoder to obtain semantic-consistent synthetic captions to complete noisy ones, which uses the retrieved visual concepts (i.e., objects’ names) for the corresponding image to guide captioning generation. By collaboratively optimizing noise-harmonization and noise-completion schemes, our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way. Extensive experiments show the significant performance improvements of our NLIP using only 26M data over existing pre-trained models (e.g., CLIP, FILIP and BLIP) on 12 zero-shot classification datasets (e.g., +8.6% over CLIP on average accuracy), MSCOCO image captioning (e.g., +1.9 over BLIP trained with 129M data on CIDEr) and zero-shot image-text retrieval tasks.
APA, Harvard, Vancouver, ISO, and other styles
21

Chen, Binghui, and Weihong Deng. "Energy Confused Adversarial Metric Learning for Zero-Shot Image Retrieval and Clustering." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8134–41. http://dx.doi.org/10.1609/aaai.v33i01.33018134.

Full text
Abstract:
Deep metric learning has been widely applied in many computer vision tasks, and recently, it is more attractive in zeroshot image retrieval and clustering (ZSRC) where a good embedding is requested such that the unseen classes can be distinguished well. Most existing works deem this ’good’ embedding just to be the discriminative one and thus race to devise powerful metric objectives or hard-sample mining strategies for leaning discriminative embedding. However, in this paper, we first emphasize that the generalization ability is a core ingredient of this ’good’ embedding as well and largely affects the metric performance in zero-shot settings as a matter of fact. Then, we propose the Energy Confused Adversarial Metric Learning (ECAML) framework to explicitly optimize a robust metric. It is mainly achieved by introducing an interesting Energy Confusion regularization term, which daringly breaks away from the traditional metric learning idea of discriminative objective devising, and seeks to ’confuse’ the learned model so as to encourage its generalization ability by reducing overfitting on the seen classes. We train this confusion term together with the conventional metric objective in an adversarial manner. Although it seems weird to ’confuse’ the network, we show that our ECAML indeed serves as an efficient regularization technique for metric learning and is applicable to various conventional metric methods. This paper empirically and experimentally demonstrates the importance of learning embedding with good generalization, achieving state-of-theart performances on the popular CUB, CARS, Stanford Online Products and In-Shop datasets for ZSRC tasks. Code available at http://www.bhchen.cn/.
APA, Harvard, Vancouver, ISO, and other styles
22

Deng, Cheng, Xinxun Xu, Hao Wang, Muli Yang, and Dacheng Tao. "Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval." IEEE Transactions on Image Processing 29 (2020): 8892–902. http://dx.doi.org/10.1109/tip.2020.3020383.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Xu, Xing, Huimin Lu, Jingkuan Song, Yang Yang, Heng Tao Shen, and Xuelong Li. "Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval." IEEE Transactions on Cybernetics 50, no. 6 (June 2020): 2400–2413. http://dx.doi.org/10.1109/tcyb.2019.2928180.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Zhao, Honggang, Mingyue Liu, and Mingyong Li. "Feature Fusion and Metric Learning Network for Zero-Shot Sketch-Based Image Retrieval." Entropy 25, no. 3 (March 14, 2023): 502. http://dx.doi.org/10.3390/e25030502.

Full text
Abstract:
Zero-shot sketch-based image retrieval (ZS-SBIR) is an important computer vision problem. The image category in the test phase is a new category that was not visible in the training stage. Because sketches are extremely abstract, the commonly used backbone networks (such as VGG-16 and ResNet-50) cannot handle both sketches and photos. Semantic similarities between the same features in photos and sketches are difficult to reflect in deep models without textual assistance. To solve this problem, we propose a novel and effective feature embedding model called Attention Map Feature Fusion (AMFF). The AMFF model combines the excellent feature extraction capability of the ResNet-50 network with the excellent representation ability of the attention network. By processing the residuals of the ResNet-50 network, the attention map is finally obtained without introducing external semantic knowledge. Most previous approaches treat the ZS-SBIR problem as a classification problem, which ignores the huge domain gap between sketches and photos. This paper proposes an effective method to optimize the entire network, called domain-aware triplets (DAT). Domain feature discrimination and semantic feature embedding can be learned through DAT. In this paper, we also use the classification loss function to stabilize the training process to avoid getting trapped in a local optimum. Compared with the state-of-the-art methods, our method shows a superior performance. For example, on the Tu-berlin dataset, we achieved 61.2 + 1.2% Prec200. On the Sketchy_c100 dataset, we achieved 62.3 + 3.3% mAPall and 75.5 + 1.5% Prec100.
APA, Harvard, Vancouver, ISO, and other styles
25

McCartney, Ben, Barry Devereux, and Jesus Martinez-del-Rincon. "A zero-shot deep metric learning approach to Brain–Computer Interfaces for image retrieval." Knowledge-Based Systems 246 (June 2022): 108556. http://dx.doi.org/10.1016/j.knosys.2022.108556.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Gorbatsevich, V., Y. Vizilter, V. Knyaz, and A. Moiseenko. "SINGLE-SHOT SEMANTIC MATCHER FOR UNSEEN OBJECT DETECTION." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2 (May 30, 2018): 379–84. http://dx.doi.org/10.5194/isprs-archives-xlii-2-379-2018.

Full text
Abstract:
In this paper we combine the ideas of image matching, object detection, image retrieval and zero-shot learning for stating and solving the semantic matching problem. Semantic matcher takes two images (test and request) as input and returns detected objects (bounding boxes) on test image corresponding to semantic class represented by request (sample) image. We implement our single-shot semantic matcher CNN architecture based on GoogleNet and YOLO/DetectNet architectures. We propose the detection-by-request training and testing protocols for semantic matching algorithms. We train and test our CNN on the ILSVRC 2014 with 200 seen and 90 unseen classes and provide the real-time object detection with mAP 23 for seen and mAP 21 for unseen classes.
APA, Harvard, Vancouver, ISO, and other styles
27

Bosselut, Antoine, Ronan Le Bras, and Yejin Choi. "Dynamic Neuro-Symbolic Knowledge Graph Construction for Zero-shot Commonsense Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 6 (May 18, 2021): 4923–31. http://dx.doi.org/10.1609/aaai.v35i6.16625.

Full text
Abstract:
Understanding narratives requires reasoning about implicit world knowledge related to the causes, effects, and states of situations described in text. At the core of this challenge is how to access contextually relevant knowledge on demand and reason over it. In this paper, we present initial studies toward zero-shot commonsense question answering by formulating the task as inference over dynamically generated commonsense knowledge graphs. In contrast to previous studies for knowledge integration that rely on retrieval of existing knowledge from static knowledge graphs, our study requires commonsense knowledge integration where contextually relevant knowledge is often not present in existing knowledge bases. Therefore, we present a novel approach that generates contextually-relevant symbolic knowledge structures on demand using generative neural commonsense knowledge models. Empirical results on two datasets demonstrate the efficacy of our neuro-symbolic approach for dynamically constructing knowledge graphs for reasoning. Our approach achieves significant performance boosts over pretrained language models and vanilla knowledge models, all while providing interpretable reasoning paths for its predictions.
APA, Harvard, Vancouver, ISO, and other styles
28

McCartney, Ben, Jesus Martinez-del-Rincon, Barry Devereux, and Brian Murphy. "A zero-shot learning approach to the development of brain-computer interfaces for image retrieval." PLOS ONE 14, no. 9 (September 16, 2019): e0214342. http://dx.doi.org/10.1371/journal.pone.0214342.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Tian, Jialin, Xing Xu, Fumin Shen, Yang Yang, and Heng Tao Shen. "TVT: Three-Way Vision Transformer through Multi-Modal Hypersphere Learning for Zero-Shot Sketch-Based Image Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2370–78. http://dx.doi.org/10.1609/aaai.v36i2.20136.

Full text
Abstract:
In this paper, we study the zero-shot sketch-based image retrieval (ZS-SBIR) task, which retrieves natural images related to sketch queries from unseen categories. In the literature, convolutional neural networks (CNNs) have become the de-facto standard and they are either trained end-to-end or used to extract pre-trained features for images and sketches. However, CNNs are limited in modeling the global structural information of objects due to the intrinsic locality of convolution operations. To this end, we propose a Transformer-based approach called Three-Way Vision Transformer (TVT) to leverage the ability of Vision Transformer (ViT) to model global contexts due to the global self-attention mechanism. Going beyond simply applying ViT to this task, we propose a token-based strategy of adding fusion and distillation tokens and making them complementary to each other. Specifically, we integrate three ViTs, which are pre-trained on data of each modality, into a three-way pipeline through the processes of distillation and multi-modal hypersphere learning. The distillation process is proposed to supervise fusion ViT (ViT with an extra fusion token) with soft targets from modality-specific ViTs, which prevents fusion ViT from catastrophic forgetting. Furthermore, our method learns a multi-modal hypersphere by performing inter- and intra-modal alignment without loss of uniformity, which aims to bridge the modal gap between modalities of sketch and image and avoid the collapse in dimensions. Extensive experiments on three benchmark datasets, i.e., Sketchy, TU-Berlin, and QuickDraw, demonstrate the superiority of our TVT method over the state-of-the-art ZS-SBIR methods.
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Taolin, Chengyu Wang, Nan Hu, Minghui Qiu, Chengguang Tang, Xiaofeng He, and Jun Huang. "DKPLM: Decomposable Knowledge-Enhanced Pre-trained Language Model for Natural Language Understanding." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 11703–11. http://dx.doi.org/10.1609/aaai.v36i10.21425.

Full text
Abstract:
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities.Experiments show that our model outperforms other KEPLMs significantly over zero-shot knowledge probing tasks and multiple knowledge-aware language understanding tasks. To guarantee effective knowledge injection, previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs. The operations for knowledge retrieval and encoding bring significant computational burdens, restricting the usage of such models in real-world applications that require high inference speed. In this paper, we propose a novel KEPLM named DKPLM that decomposes knowledge injection process of the pre-trained language models in pre-training, fine-tuning and inference stages, which facilitates the applications of KEPLMs in real-world scenarios. Specifically, we first detect knowledge-aware long-tail entities as the target for knowledge injection, enhancing the KEPLMs' semantic understanding abilities and avoiding injecting redundant information. The embeddings of long-tail entities are replaced by ``pseudo token representations'' formed by relevant knowledge triples. We further design the relational knowledge decoding task for pre-training to force the models to truly understand the injected knowledge by relation triple reconstruction. Experiments show that our model outperforms other KEPLMs significantly over zero-shot knowledge probing tasks and multiple knowledge-aware language understanding tasks. We further show that DKPLM has a higher inference speed than other competing models due to the decomposing mechanism.
APA, Harvard, Vancouver, ISO, and other styles
31

Zhao, Yuying, Hanjiang Lai, Jian Yin, Yewu Zhang, Shigui Yang, Zhongwei Jia, and Jiaqi Ma. "Zero-Shot Medical Image Retrieval for Emerging Infectious Diseases Based on Meta-Transfer Learning — Worldwide, 2020." China CDC Weekly 2, no. 52 (2020): 1004–8. http://dx.doi.org/10.46234/ccdcw2020.268.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Gholami, Sia, and Mehdi Noori. "You Don’t Need Labeled Data for Open-Book Question Answering." Applied Sciences 12, no. 1 (December 23, 2021): 111. http://dx.doi.org/10.3390/app12010111.

Full text
Abstract:
Open-book question answering is a subset of question answering (QA) tasks where the system aims to find answers in a given set of documents (open-book) and common knowledge about a topic. This article proposes a solution for answering natural language questions from a corpus of Amazon Web Services (AWS) technical documents with no domain-specific labeled data (zero-shot). These questions have a yes–no–none answer and a text answer which can be short (a few words) or long (a few sentences). We present a two-step, retriever–extractor architecture in which a retriever finds the right documents and an extractor finds the answers in the retrieved documents. To test our solution, we are introducing a new dataset for open-book QA based on real customer questions on AWS technical documentation. In this paper, we conducted experiments on several information retrieval systems and extractive language models, attempting to find the yes–no–none answers and text answers in the same pass. Our custom-built extractor model is created from a pretrained language model and fine-tuned on the the Stanford Question Answering Dataset—SQuAD and Natural Questions datasets. We were able to achieve 42% F1 and 39% exact match score (EM) end-to-end with no domain-specific training.
APA, Harvard, Vancouver, ISO, and other styles
33

Schwenk, Holger, Douwe Kiela, and Matthijs Douze. "Analysis of Joint Multilingual Sentence Representations and Semantic K-Nearest Neighbor Graphs." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6982–90. http://dx.doi.org/10.1609/aaai.v33i01.33016982.

Full text
Abstract:
Multilingual sentence and document representations are becoming increasingly important. We build on recent advances in multilingual sentence encoders, with a focus on efficiency and large-scale applicability. Specifically, we construct and investigate the k-nn graph over the joint space of 566 million news sentences in seven different languages. We show excellent multilingual retrieval quality on the UN corpus of 11.3M sentences, which extends to the zero-shot case where we have never seen a language. We provide a detailed analysis of both the multilingual sentence encoder for twenty-one European languages and the learned graph. Our sentence encoder is language agnostic and supports code switching.
APA, Harvard, Vancouver, ISO, and other styles
34

Dong, Shuai, Zhihua Yang, Wensheng Li, and Kun Zou. "Dynamic Detection and Recognition of Objects Based on Sequential RGB Images." Future Internet 13, no. 7 (July 7, 2021): 176. http://dx.doi.org/10.3390/fi13070176.

Full text
Abstract:
Conveyors are used commonly in industrial production lines and automated sorting systems. Many applications require fast, reliable, and dynamic detection and recognition for the objects on conveyors. Aiming at this goal, we design a framework that involves three subtasks: one-class instance segmentation (OCIS), multiobject tracking (MOT), and zero-shot fine-grained recognition of 3D objects (ZSFGR3D). A new level set map network (LSMNet) and a multiview redundancy-free feature network (MVRFFNet) are proposed for the first and third subtasks, respectively. The level set map (LSM) is used to annotate instances instead of the traditional multichannel binary mask, and each peak of the LSM represents one instance. Based on the LSM, LSMNet can adopt a pix2pix architecture to segment instances. MVRFFNet is a generalized zero-shot learning (GZSL) framework based on the Wasserstein generative adversarial network for 3D object recognition. Multi-view features of an object are combined into a compact registered feature. By treating the registered features as the category attribution in the GZSL setting, MVRFFNet learns a mapping function that maps original retrieve features into a new redundancy-free feature space. To validate the performance of the proposed methods, a segmentation dataset and a fine-grained classification dataset about objects on a conveyor are established. Experimental results on these datasets show that LSMNet can achieve a recalling accuracy close to the light instance segmentation framework You Only Look At CoefficienTs (YOLACT), while its computing speed on an NVIDIA GTX1660TI GPU is 80 fps, which is much faster than YOLACT’s 25 fps. Redundancy-free features generated by MVRFFNet perform much better than original features in the retrieval task.
APA, Harvard, Vancouver, ISO, and other styles
35

Lin, Sheng-Chieh, Minghan Li, and Jimmy Lin. "Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval." Transactions of the Association for Computational Linguistics 11 (2023): 436–52. http://dx.doi.org/10.1162/tacl_a_00556.

Full text
Abstract:
Abstract Pre-trained language models have been successful in many knowledge-intensive NLP tasks. However, recent work has shown that models such as BERT are not “structurally ready” to aggregate textual information into a [CLS] vector for dense passage retrieval (DPR). This “lack of readiness” results from the gap between language model pre-training and DPR fine-tuning. Previous solutions call for computationally expensive techniques such as hard negative mining, cross-encoder distillation, and further pre-training to learn a robust DPR model. In this work, we instead propose to fully exploit knowledge in a pre-trained language model for DPR by aggregating the contextualized token embeddings into a dense vector, which we call agg★. By concatenating vectors from the [CLS] token and agg★, our Aggretriever model substantially improves the effectiveness of dense retrieval models on both in-domain and zero-shot evaluations without introducing substantial training overhead. Code is available at https://github.com/castorini/dhr.
APA, Harvard, Vancouver, ISO, and other styles
36

Wang, Zixiang, Tongliang Li, and Zhoujun Li. "Unsupervised Numerical Information Extraction via Exploiting Syntactic Structures." Electronics 12, no. 9 (April 24, 2023): 1977. http://dx.doi.org/10.3390/electronics12091977.

Full text
Abstract:
Numerical information plays an important role in various fields such as scientific, financial, social, statistics, and news. Most prior studies adopt unsupervised methods by designing complex handcrafted pattern-matching rules to extract numerical information, which can be difficult to scale to the open domain. Other supervised methods require extra time, cost, and knowledge to design, understand, and annotate the training data. To address these limitations, we propose QuantityIE, a novel approach to extracting numerical information as structured representations by exploiting syntactic features of both constituency parsing (CP) and dependency parsing (DP). The extraction results may also serve as distant supervision for zero-shot model training. Our approach outperforms existing methods from two perspectives: (1) the rules are simple yet effective, and (2) the results are more self-contained. We further propose a numerical information retrieval approach based on QuantityIE to answer analytical queries. Experimental results on information extraction and retrieval demonstrate the effectiveness of QuantityIE in extracting numerical information with high fidelity.
APA, Harvard, Vancouver, ISO, and other styles
37

UEKI, Kazuya, Koji HIRAKAWA, Kotaro KIKUCHI, and Tetsunori KOBAYASHI. "Zero-Shot Video Retrieval from a Query Phrase Including Multiple Concepts —Efforts and Challenges in TRECVID AVS Task—." Journal of the Japan Society for Precision Engineering 84, no. 12 (December 5, 2018): 983–90. http://dx.doi.org/10.2493/jjspe.84.983.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Hendricks, Lisa Anne, John Mellor, Rosalia Schneider, Jean-Baptiste Alayrac, and Aida Nematzadeh. "Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers." Transactions of the Association for Computational Linguistics 9 (2021): 570–85. http://dx.doi.org/10.1162/tacl_a_00385.

Full text
Abstract:
Abstract Recently, multimodal transformer models have gained popularity because their performance on downstream tasks suggests they learn rich visual-linguistic representations. Focusing on zero-shot image retrieval tasks, we study three important factors that can impact the quality of learned representations: pretraining data, the attention mechanism, and loss functions. By pretraining models on six datasets, we observe that dataset noise and language similarity to our downstream task are important indicators of model performance. Through architectural analysis, we learn that models with a multimodal attention mechanism can outperform deeper models with modality-specific attention mechanisms. Finally, we show that successful contrastive losses used in the self-supervised learning literature do not yield similar performance gains when used in multimodal transformers.
APA, Harvard, Vancouver, ISO, and other styles
39

Kern, Stefan, and Gunnar Spreen. "Uncertainties in Antarctic sea-ice thickness retrieval from ICESat." Annals of Glaciology 56, no. 69 (2015): 107–19. http://dx.doi.org/10.3189/2015aog69a736.

Full text
Abstract:
AbstractA sensitivity study was carried out for the lowest-level elevation method to retrieve total (sea ice + snow) freeboard from Ice, Cloud and land Elevation Satellite (ICESat) elevation measurements in the Weddell Sea, Antarctica. Varying the percentage (P) of elevations used to approximate the instantaneous sea-surface height can cause widespread changes of a few to ˃10cm in the total freeboard obtained. Other input parameters have a smaller influence on the overall mean total freeboard but can cause large regional differences. These results, together with published ICESat elevation precision and accuracy, suggest that three times the mean per gridcell single-laser-shot error budget can be used as an estimate for freeboard uncertainty. Theoretical relative ice thickness uncertainty ranges between 20% and 80% for typical freeboard and snow properties. Ice thickness is computed from total freeboard using Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E) snow depth data. Average ice thickness for the Weddell Sea is 1.73 ± 0.38 m for ICESat measurements from 2004 to 2006, in agreement with previous work. The mean uncertainty is 0.72 ± 0.09 m. Our comparison with data of an alternative approach, which assumes that sea-ice freeboard is zero and that total freeboard equals snow depth, reveals an average sea-ice thickness difference of ∼0.77m.
APA, Harvard, Vancouver, ISO, and other styles
40

Glavašš, Goran, and Swapna Somasundaran. "Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7797–804. http://dx.doi.org/10.1609/aaai.v34i05.6284.

Full text
Abstract:
Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval. Starting from an apparent link between text coherence and segmentation, we introduce a novel supervised model for text segmentation with simple but explicit coherence modeling. Our model – a neural architecture consisting of two hierarchically connected Transformer networks – is a multi-task learning model that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones. The proposed model, dubbed Coherence-Aware Text Segmentation (CATS), yields state-of-the-art segmentation performance on a collection of benchmark datasets. Furthermore, by coupling CATS with cross-lingual word embeddings, we demonstrate its effectiveness in zero-shot language transfer: it can successfully segment texts in languages unseen in training.
APA, Harvard, Vancouver, ISO, and other styles
41

Jang, Jiho, Chaerin Kong, DongHyeon Jeon, Seonhoon Kim, and Nojun Kwak. "Unifying Vision-Language Representation Space with Single-Tower Transformer." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 980–88. http://dx.doi.org/10.1609/aaai.v37i1.25178.

Full text
Abstract:
Contrastive learning is a form of distance learning that aims to learn invariant features from two related representations. In this work, we explore the hypothesis that an image and caption can be regarded as two different views of the underlying mutual information, and train a model to learn a unified vision-language representation space that encodes both modalities at once in a modality-agnostic manner. We first identify difficulties in learning a one-tower model for vision-language pretraining (VLP), and propose One Representation (OneR) as a simple yet effective framework for our goal. We discover intriguing properties that distinguish OneR from the previous works that have modality-specific representation spaces such as zero-shot localization, text-guided visual reasoning and multi-modal retrieval, and present analyses to provide insights into this new form of multi-modal representation learning. Thorough evaluations demonstrate the potential of a unified modality-agnostic VLP framework.
APA, Harvard, Vancouver, ISO, and other styles
42

Niu, Yilin, Fei Huang, Wei Liu, Jianwei Cui, Bin Wang, and Minlie Huang. "Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing." Transactions of the Association for Computational Linguistics 11 (2023): 367–83. http://dx.doi.org/10.1162/tacl_a_00552.

Full text
Abstract:
Abstract Semantic parsing maps natural language questions into logical forms, which can be executed against a knowledge base for answers. In real-world applications, the performance of a parser is often limited by the lack of training data. To facilitate zero-shot learning, data synthesis has been widely studied to automatically generate paired questions and logical forms. However, data synthesis methods can hardly cover the diverse structures in natural languages, leading to a large gap in sentence structure between synthetic and natural questions. In this paper, we propose a decomposition-based method to unify the sentence structures of questions, which benefits the generalization to natural questions. Experiments demonstrate that our method significantly improves the semantic parser trained on synthetic data (+7.9% on KQA and +8.9% on ComplexWebQuestions in terms of exact match accuracy). Extensive analysis demonstrates that our method can better generalize to natural questions with novel text expressions compared with baselines. Besides semantic parsing, our idea potentially benefits other semantic understanding tasks by mitigating the distracting structure features. To illustrate this, we extend our method to the task of sentence embedding learning, and observe substantial improvements on sentence retrieval (+13.1% for Hit@1).
APA, Harvard, Vancouver, ISO, and other styles
43

Lazaridou, Angeliki, Georgiana Dinu, Adam Liska, and Marco Baroni. "From Visual Attributes to Adjectives through Decompositional Distributional Semantics." Transactions of the Association for Computational Linguistics 3 (December 2015): 183–96. http://dx.doi.org/10.1162/tacl_a_00132.

Full text
Abstract:
As automated image analysis progresses, there is increasing interest in richer linguistic annotation of pictures, with attributes of objects (e.g., furry, brown…) attracting most attention. By building on the recent “zero-shot learning” approach, and paying attention to the linguistic nature of attributes as noun modifiers, and specifically adjectives, we show that it is possible to tag images with attribute-denoting adjectives even when no training data containing the relevant annotation are available. Our approach relies on two key observations. First, objects can be seen as bundles of attributes, typically expressed as adjectival modifiers (a dog is something furry, brown, etc.), and thus a function trained to map visual representations of objects to nominal labels can implicitly learn to map attributes to adjectives. Second, objects and attributes come together in pictures (the same thing is a dog and it is brown). We can thus achieve better attribute (and object) label retrieval by treating images as “visual phrases”, and decomposing their linguistic representation into an attribute-denoting adjective and an object-denoting noun. Our approach performs comparably to a method exploiting manual attribute annotation, it out-performs various competitive alternatives in both attribute and object annotation, and it automatically constructs attribute-centric representations that significantly improve performance in supervised object recognition.
APA, Harvard, Vancouver, ISO, and other styles
44

Vijayaraghavan, Prashanth, and Deb Roy. "M-sense: Modeling Narrative Structure in Short Personal Narratives Using Protagonist’s Mental Representations." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 13664–72. http://dx.doi.org/10.1609/aaai.v37i11.26601.

Full text
Abstract:
Narrative is a ubiquitous component of human communication. Understanding its structure plays a critical role in a wide variety of applications, ranging from simple comparative analyses to enhanced narrative retrieval, comprehension, or reasoning capabilities. Prior research in narratology has highlighted the importance of studying the links between cognitive and linguistic aspects of narratives for effective comprehension. This interdependence is related to the textual semantics and mental language in narratives, referring to characters' motivations, feelings or emotions, and beliefs. However, this interdependence is hardly explored for modeling narratives. In this work, we propose the task of automatically detecting prominent elements of the narrative structure by analyzing the role of characters' inferred mental state along with linguistic information at the syntactic and semantic levels. We introduce a STORIES dataset of short personal narratives containing manual annotations of key elements of narrative structure, specifically climax and resolution. To this end, we implement a computational model that leverages the protagonist's mental state information obtained from a pre-trained model trained on social commonsense knowledge and integrates their representations with contextual semantic embed-dings using a multi-feature fusion approach. Evaluating against prior zero-shot and supervised baselines, we find that our model is able to achieve significant improvements in the task of identifying climax and resolution.
APA, Harvard, Vancouver, ISO, and other styles
45

Li, Xiaoyu, Weihong Wang, Jifei Fang, Li Jin, Hankun Kang, and Chunbo Liu. "PEINet: Joint Prompt and Evidence Inference Network via Language Family Policy for Zero-Shot Multilingual Fact Checking." Applied Sciences 12, no. 19 (September 27, 2022): 9688. http://dx.doi.org/10.3390/app12199688.

Full text
Abstract:
Zero-shot multilingual fact-checking, which aims to discover and infer subtle clues from the retrieved relevant evidence to verify the given claim in cross-language and cross-domain scenarios, is crucial for optimizing a free, trusted, wholesome global network environment. Previous works have made enlightening and practical explorations in claim verification, while the zero-shot multilingual task faces new challenging gap issues: neglecting authenticity-dependent learning between multilingual claims, lacking heuristic checking, and a bottleneck of insufficient evidence. To alleviate these gaps, a novel Joint Prompt and Evidence Inference Network (PEINet) is proposed to verify the multilingual claim according to the human fact-checking cognitive paradigm. In detail, firstly, we leverage the language family encoding mechanism to strengthen knowledge transfer among multi-language claims. Then, the prompt turning module is designed to infer the falsity of the fact, and further, sufficient fine-grained evidence is extracted and aggregated based on a recursive graph attention network to verify the claim again. Finally, we build a unified inference framework via multi-task learning for final fact verification. The newly achieved state-of-the-art performance on the released challenging benchmark dataset that includes not only an out-of-domain test, but also a zero-shot test, proves the effectiveness of our framework, and further analysis demonstrates the superiority of our PEINet in multilingual claim verification and inference, especially in the zero-shot scenario.
APA, Harvard, Vancouver, ISO, and other styles
46

Wang , Ping, Li Sun, Liuan Wang, and Jun Sun. "Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities." Sustainability 15, no. 1 (December 22, 2022): 153. http://dx.doi.org/10.3390/su15010153.

Full text
Abstract:
Automatic video understanding is a crucial piece of technology which promotes urban sustainability. Video grounding is a fundamental component of video understanding that has been evolving quickly in recent years, but its use is restricted due to the high labeling costs and typical performance limitations imposed by the pre-defined training dataset. In this paper, a novel atom-based zero-shot video grounding (AZVG) method is proposed to retrieve the segments in the video that correspond to a given input sentence. Although it is training-free, the performance of AZVG is competitive to the weakly supervised methods and better than unsupervised SOTA methods on the Charades-STA dataset. The method can support flexible queries as well as different video content. It can play an important role in a wider range of urban living applications.
APA, Harvard, Vancouver, ISO, and other styles
47

Nakata, Nori, and Gregory C. Beroza. "Reverse time migration for microseismic sources using the geometric mean as an imaging condition." GEOPHYSICS 81, no. 2 (March 1, 2016): KS51—KS60. http://dx.doi.org/10.1190/geo2015-0278.1.

Full text
Abstract:
Time reversal is a powerful tool used to image directly the location and mechanism of passive seismic sources. This technique assumes seismic velocities in the medium and propagates time-reversed observations of ground motion at each receiver location. Assuming an accurate velocity model and adequate array aperture, the waves will focus at the source location. Because we do not know the location and the origin time a priori, we need to scan the entire 4D image (3D in space and 1D in time) to localize the source, which makes time-reversal imaging computationally demanding. We have developed a new approach of time-reversal imaging that reduces the computational cost and the scanning dimensions from 4D to 3D (no time) and increases the spatial resolution of the source image. We first individually extrapolate wavefields at each receiver, and then we crosscorrelate these wavefields (the product in the frequency domain: geometric mean). This crosscorrelation creates another imaging condition, and focusing of the seismic wavefields occurs at the zero time lag of the correlation provided the velocity model is sufficiently accurate. Due to the analogy to the active-shot reverse time migration (RTM), we refer to this technique as the geometric-mean RTM or GmRTM. In addition to reducing the dimension from 4D to 3D compared with conventional time-reversal imaging, the crosscorrelation effectively suppresses the side lobes and yields a spatially high-resolution image of seismic sources. The GmRTM is robust for random and coherent noise because crosscorrelation enhances signal and suppresses noise. An added benefit is that, in contrast to conventional time-reversal imaging, GmRTM has the potential to be used to retrieve velocity information by analyzing time and/or space lags of crosscorrelation, which is similar to what is done in active-source imaging.
APA, Harvard, Vancouver, ISO, and other styles
48

Long, Yang, Li Liu, Yuming Shen, and Ling Shao. "Towards Affordable Semantic Searching: Zero-Shot Retrieval via Dominant Attributes." Proceedings of the AAAI Conference on Artificial Intelligence 32, no. 1 (April 27, 2018). http://dx.doi.org/10.1609/aaai.v32i1.12280.

Full text
Abstract:
Instance-level retrieval has become an essential paradigm to index and retrieves images from large-scale databases. Conventional instance search requires at least an example of the query image to retrieve images that contain the same object instance. Existing semantic retrieval can only search semantically-related images, such as those sharing the same category or a set of tags, not the exact instances. Meanwhile, the unrealistic assumption is that all categories or tags are known beforehand. Training models for these semantic concepts highly rely on instance-level attributes or human captions which are expensive to acquire. Given the above challenges, this paper studies the Zero-shot Retrieval problem that aims for instance-level image search using only a few dominant attributes. The contributions are: 1) we utilise automatic word embedding to infer class-level attributes to circumvent expensive human labelling; 2) the inferred class-attributes can be extended into discriminative instance attributes through our proposed Latent Instance Attributes Discovery (LIAD) algorithm; 3) our method is not restricted to complete attribute signatures, query of dominant attributes can also be dealt with. On two benchmarks, CUB and SUN, extensive experiments demonstrate that our method can achieve promising performance for the problem. Moreover, our approach can also benefit conventional ZSL tasks.
APA, Harvard, Vancouver, ISO, and other styles
49

Zou, Qin, Ling Cao, Zheng Zhang, Long Chen, and Song Wang. "Transductive Zero-Shot Hashing for Multilabel Image Retrieval." IEEE Transactions on Neural Networks and Learning Systems, 2020, 1–15. http://dx.doi.org/10.1109/tnnls.2020.3043298.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Liu, Cong, Wenhao She, Minjie Chen, Xiaofang Li, and Simon X. Yang. "Consistent penalizing field loss for zero-shot image retrieval." Expert Systems with Applications, August 2023, 121287. http://dx.doi.org/10.1016/j.eswa.2023.121287.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography