Log in

Relevant bibliographies by topics / OOD generalization / Journal articles

To see the other types of publications on this topic, follow the link: OOD generalization.

Journal articles on the topic 'OOD generalization'

Author: Grafiati

Published: 25 January 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'OOD generalization.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Ye, Nanyang, Lin Zhu, Jia Wang, Zhaoyu Zeng, Jiayao Shao, Chensheng Peng, Bikang Pan, Kaican Li, and Jun Zhu. "Certifiable Out-of-Distribution Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 10927–35. http://dx.doi.org/10.1609/aaai.v37i9.26295.

Full text

Abstract:

Machine learning methods suffer from test-time performance degeneration when faced with out-of-distribution (OoD) data whose distribution is not necessarily the same as training data distribution. Although a plethora of algorithms have been proposed to mitigate this issue, it has been demonstrated that achieving better performance than ERM simultaneously on different types of distributional shift datasets is challenging for existing approaches. Besides, it is unknown how and to what extent these methods work on any OoD datum without theoretical guarantees. In this paper, we propose a certifiable out-of-distribution generalization method that provides provable OoD generalization performance guarantees via a functional optimization framework leveraging random distributions and max-margin learning for each input datum. With this approach, the proposed algorithmic scheme can provide certified accuracy for each input datum's prediction on the semantic space and achieves better performance simultaneously on OoD datasets dominated by correlation shifts or diversity shifts. Our code is available at https://github.com/ZlatanWilliams/StochasticDisturbanceLearning.

APA, Harvard, Vancouver, ISO, and other styles

2

Gwon, Kyungpil, and Joonhyuk Yoo. "Out-of-Distribution (OOD) Detection and Generalization Improved by Augmenting Adversarial Mixup Samples." Electronics 12, no. 6 (March 16, 2023): 1421. http://dx.doi.org/10.3390/electronics12061421.

Full text

Abstract:

Deep neural network (DNN) models are usually built based on the i.i.d. (independent and identically distributed), also known as in-distribution (ID), assumption on the training samples and test data. However, when models are deployed in a real-world scenario with some distributional shifts, test data can be out-of-distribution (OOD) and both OOD detection and OOD generalization should be simultaneously addressed to ensure the reliability and safety of applied AI systems. Most existing OOD detectors pursue these two goals separately, and therefore, are sensitive to covariate shift rather than semantic shift. To alleviate this problem, this paper proposes a novel adversarial mixup (AM) training method which simply executes OOD data augmentation to synthesize differently distributed data and designs a new AM loss function to learn how to handle OOD data. The proposed AM generates OOD samples being significantly diverged from the support of training data distribution but not completely disjoint to increase the generalization capability of the OOD detector. In addition, the AM is combined with a distributional-distance-aware OOD detector at inference to detect semantic OOD samples more efficiently while being robust to covariate shift due to data tampering. Experimental evaluation validates that the designed AM is effective on both OOD detection and OOD generalization tasks compared to previous OOD detectors and data mixup methods.

APA, Harvard, Vancouver, ISO, and other styles

3

Zhu, Lin, Xinbing Wang, Chenghu Zhou, and Nanyang Ye. "Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 11461–69. http://dx.doi.org/10.1609/aaai.v37i9.26355.

Full text

Abstract:

Recent advances in large pre-trained models showed promising results in few-shot learning. However, their generalization ability on two-dimensional Out-of-Distribution (OoD) data, i.e., correlation shift and diversity shift, has not been thoroughly investigated. Researches have shown that even with a significant amount of training data, few methods can achieve better performance than the standard empirical risk minimization method (ERM) in OoD generalization. This few-shot OoD generalization dilemma emerges as a challenging direction in deep neural network generalization research, where the performance suffers from overfitting on few-shot examples and OoD generalization errors. In this paper, leveraging a broader supervision source, we explore a novel Bayesian cross-modal image-text alignment learning method (Bayes-CAL) to address this issue. Specifically, the model is designed as only text representations are fine-tuned via a Bayesian modelling approach with gradient orthogonalization loss and invariant risk minimization (IRM) loss. The Bayesian approach is essentially introduced to avoid overfitting the base classes observed during training and improve generalization to broader unseen classes. The dedicated loss is introduced to achieve better image-text alignment by disentangling the causal and non-casual parts of image features. Numerical experiments demonstrate that Bayes-CAL achieved state-of-the-art OoD generalization performances on two-dimensional distribution shifts. Moreover, compared with CLIP-like models, Bayes-CAL yields more stable generalization performances on unseen classes. Our code is available at https://github.com/LinLLLL/BayesCAL.

APA, Harvard, Vancouver, ISO, and other styles

4

Liao, Yufan, Qi Wu, and Xing Yan. "Invariant Random Forest: Tree-Based Model Solution for OOD Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 12 (March 24, 2024): 13772–81. http://dx.doi.org/10.1609/aaai.v38i12.29283.

Full text

Abstract:

Out-Of-Distribution (OOD) generalization is an essential topic in machine learning. However, recent research is only focusing on the corresponding methods for neural networks. This paper introduces a novel and effective solution for OOD generalization of decision tree models, named Invariant Decision Tree (IDT). IDT enforces a penalty term with regard to the unstable/varying behavior of a split across different environments during the growth of the tree. Its ensemble version, the Invariant Random Forest (IRF), is constructed. Our proposed method is motivated by a theoretical result under mild conditions, and validated by numerical tests with both synthetic and real datasets. The superior performance compared to non-OOD tree models implies that considering OOD generalization for tree models is absolutely necessary and should be given more attention.

APA, Harvard, Vancouver, ISO, and other styles

5

Bai, Haoyue, Rui Sun, Lanqing Hong, Fengwei Zhou, Nanyang Ye, Han-Jia Ye, S. H. Gary Chan, and Zhenguo Li. "DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 6705–13. http://dx.doi.org/10.1609/aaai.v35i8.16829.

Full text

Abstract:

While deep learning demonstrates its strong ability to handle independent and identically distributed (IID) data, it often suffers from out-of-distribution (OoD) generalization, where the test data come from another distribution (w.r.t. the training one). Designing a general OoD generalization framework for a wide range of applications is challenging, mainly due to different kinds of distribution shifts in the real world, such as the shift across domains or the extrapolation of correlation. Most of the previous approaches can only solve one specific distribution shift, leading to unsatisfactory performance when applied to various OoD benchmarks. In this work, we propose DecAug, a novel decomposed feature representation and semantic augmentation approach for OoD generalization. Specifically, DecAug disentangles the category-related and context-related features by orthogonalizing the two gradients (w.r.t. intermediate features) of losses for predicting category and context labels, where category-related features contain causal information of the target object, while context-related features cause distribution shifts between training and test data. Furthermore, we perform gradient-based augmentation on context-related features to improve the robustness of learned representations. Experimental results show that DecAug outperforms other state-of-the-art methods on various OoD datasets, which is among the very few methods that can deal with different types of OoD generalization challenges.

APA, Harvard, Vancouver, ISO, and other styles

6

Shao, Youjia, Shaohui Wang, and Wencang Zhao. "A Causality-Aware Perspective on Domain Generalization via Domain Intervention." Electronics 13, no. 10 (May 11, 2024): 1891. http://dx.doi.org/10.3390/electronics13101891.

Full text

Abstract:

Most mainstream statistical models will achieve poor performance in Out-Of-Distribution (OOD) generalization. This is because these models tend to learn the spurious correlation between data and will collapse when the domain shift exists. If we want artificial intelligence (AI) to make great strides in real life, the current focus needs to be shifted to the OOD problem of deep learning models to explore the generalization ability under unknown environments. Domain generalization (DG) focusing on OOD generalization is proposed, which is able to transfer the knowledge extracted from multiple source domains to the unseen target domain. We are inspired by intuitive thinking about human intelligence relying on causality. Unlike relying on plain probability correlations, we apply a novel causal perspective to DG, which can improve the OOD generalization ability of the trained model by mining the invariant causal mechanism. Firstly, we construct the inclusive causal graph for most DG tasks through stepwise causal analysis based on the data generation process in the natural environment and introduce the reasonable Structural Causal Model (SCM). Secondly, based on counterfactual inference, causal semantic representation learning with domain intervention (CSRDN) is proposed to train a robust model. In this regard, we generate counterfactual representations for different domain interventions, which can help the model learn causal semantics and develop generalization capacity. At the same time, we seek the Pareto optimal solution in the optimization process based on the loss function to obtain a more advanced training model. Extensive experimental results of Rotated MNIST and PACS as well as VLCS datasets verify the effectiveness of the proposed CSRDN. The proposed method can integrate causal inference into domain generalization by enhancing interpretability and applicability and brings a boost to challenging OOD generalization problems.

APA, Harvard, Vancouver, ISO, and other styles

7

Su, Hang, and Wei Wang. "An Out-of-Distribution Generalization Framework Based on Variational Backdoor Adjustment." Mathematics 12, no. 1 (December 26, 2023): 85. http://dx.doi.org/10.3390/math12010085.

Full text

Abstract:

In practical applications, learning models that can perform well even when the data distribution is different from the training set are essential and meaningful. Such problems are often referred to as out-of-distribution (OOD) generalization problems. In this paper, we propose a method for OOD generalization based on causal inference. Unlike the prevalent OOD generalization methods, our approach does not require the environment labels associated with the data in the training set. We analyze the causes of distributional shifts in data from a causal modeling perspective and then propose a backdoor adjustment method based on variational inference. Finally, we constructed a unique network structure to simulate the variational inference process. The proposed variational backdoor adjustment (VBA) framework can be combined with any mainstream backbone network. In addition to theoretical derivation, we conduct experiments on different datasets to demonstrate that our method performs well in prediction accuracy and generalization gaps. Furthermore, by comparing the VBA framework with other mainstream OOD methods, we show that VBA performs better than mainstream methods.

APA, Harvard, Vancouver, ISO, and other styles

8

Zhang, Lily H., and Rajesh Ranganath. "Robustness to Spurious Correlations Improves Semantic Out-of-Distribution Detection." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 15305–12. http://dx.doi.org/10.1609/aaai.v37i12.26785.

Full text

Abstract:

Methods which utilize the outputs or feature representations of predictive models have emerged as promising approaches for out-of-distribution (OOD) detection of image inputs. However, as demonstrated in previous work, these methods struggle to detect OOD inputs that share nuisance values (e.g. background) with in-distribution inputs. The detection of shared-nuisance OOD (SN-OOD) inputs is particularly relevant in real-world applications, as anomalies and in-distribution inputs tend to be captured in the same settings during deployment. In this work, we provide a possible explanation for these failures and propose nuisance-aware OOD detection to address them. Nuisance-aware OOD detection substitutes a classifier trained via Empirical Risk Minimization (ERM) with one that 1. approximates a distribution where the nuisance-label relationship is broken and 2. yields representations that are independent of the nuisance under this distribution, both marginally and conditioned on the label. We can train a classifier to achieve these objectives using Nuisance-Randomized Distillation (NuRD), an algorithm developed for OOD generalization under spurious correlations. Output- and feature-based nuisance-aware OOD detection perform substantially better than their original counterparts, succeeding even when detection based on domain generalization algorithms fails to improve performance.

APA, Harvard, Vancouver, ISO, and other styles

9

Yu, Runpeng, Hong Zhu, Kaican Li, Lanqing Hong, Rui Zhang, Nanyang Ye, Shao-Lun Huang, and Xiuqiang He. "Regularization Penalty Optimization for Addressing Data Quality Variance in OoD Algorithms." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8945–53. http://dx.doi.org/10.1609/aaai.v36i8.20877.

Full text

Abstract:

Due to the poor generalization performance of traditional empirical risk minimization (ERM) in the case of distributional shift, Out-of-Distribution (OoD) generalization algorithms receive increasing attention. However, OoD generalization algorithms overlook the great variance in the quality of training data, which significantly compromises the accuracy of these methods. In this paper, we theoretically reveal the relationship between training data quality and algorithm performance, and analyze the optimal regularization scheme for Lipschitz regularized invariant risk minimization. A novel algorithm is proposed based on the theoretical results to alleviate the influence of low quality data at both the sample level and the domain level. The experiments on both the regression and classification benchmarks validate the effectiveness of our method with statistical significance.

APA, Harvard, Vancouver, ISO, and other styles

10

Cao, Linfeng, Aofan Jiang, Wei Li, Huaying Wu, and Nanyang Ye. "OoDHDR-Codec: Out-of-Distribution Generalization for HDR Image Compression." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 158–66. http://dx.doi.org/10.1609/aaai.v36i1.19890.

Full text

Abstract:

Recently, deep learning has been proven to be a promising approach in standard dynamic range (SDR) image compression. However, due to the wide luminance distribution of high dynamic range (HDR) images and the lack of large standard datasets, developing a deep model for HDR image compression is much more challenging. To tackle this issue, we view HDR data as distributional shifts of SDR data and the HDR image compression can be modeled as an out-of-distribution generalization (OoD) problem. Herein, we propose a novel out-of-distribution (OoD) HDR image compression framework (OoDHDR-codec). It learns the general representation across HDR and SDR environments, and allows the model to be trained effectively using a large set of SDR datases supplemented with much fewer HDR samples. Specifically, OoDHDR-codec consists of two branches to process the data from two environments. The SDR branch is a standard blackbox network. For the HDR branch, we develop a hybrid system that models luminance masking and tone mapping with white-box modules and performs content compression with black-box neural networks. To improve the generalization from SDR training data on HDR data, we introduce an invariance regularization term to learn the common representation for both SDR and HDR compression. Extensive experimental results show that the OoDHDR codec achieves strong competitive in-distribution performance and state-of-the-art OoD performance. To the best of our knowledge, our proposed approach is the first work to model HDR compression as OoD generalization problems and our OoD generalization algorithmic framework can be applied to any deep compression model in addition to the network architectural choice demonstrated in the paper. Code available at https://github.com/caolinfeng/OoDHDR-codec.

APA, Harvard, Vancouver, ISO, and other styles

11

Yu, Yemin, Luotian Yuan, Ying Wei, Hanyu Gao, Fei Wu, Zhihua Wang, and Xinhai Ye. "RetroOOD: Understanding Out-of-Distribution Generalization in Retrosynthesis Prediction." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 1 (March 24, 2024): 374–82. http://dx.doi.org/10.1609/aaai.v38i1.27791.

Full text

Abstract:

Machine learning-assisted retrosynthesis prediction models have been gaining widespread adoption, though their performances oftentimes degrade significantly when deployed in real-world applications embracing out-of-distribution (OOD) molecules or reactions. Despite steady progress on standard benchmarks, our understanding of existing retrosynthesis prediction models under the premise of distribution shifts remains stagnant. To this end, we first formally sort out two types of distribution shifts in retrosynthesis prediction and construct two groups of benchmark datasets. Next, through comprehensive experiments, we systematically compare state-of-the-art retrosynthesis prediction models on the two groups of benchmarks, revealing the limitations of previous in-distribution evaluation and re-examining the advantages of each model. More remarkably, we are motivated by the above empirical insights to propose two model-agnostic techniques that can improve the OOD generalization of arbitrary off-the-shelf retrosynthesis prediction algorithms. Our preliminary experiments show their high potential with an average performance improvement of 4.6%, and the established benchmarks serve as a foothold for further retrosynthesis prediction research towards OOD generalization.

APA, Harvard, Vancouver, ISO, and other styles

12

Zhao, Xilong, Siyuan Bian, Yaoyun Zhang, Yuliang Zhang, Qinying Gu, Xinbing Wang, Chenghu Zhou, and Nanyang Ye. "Domain Invariant Learning for Gaussian Processes and Bayesian Exploration." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 17024–32. http://dx.doi.org/10.1609/aaai.v38i15.29646.

Full text

Abstract:

Out-of-distribution (OOD) generalization has long been a challenging problem that remains largely unsolved. Gaussian processes (GP), as popular probabilistic model classes, especially in the small data regime, presume strong OOD generalization abilities. Surprisingly, their OOD generalization abilities have been under-explored before compared with other lines of GP research. In this paper, we identify that GP is not free from the problem and propose a domain invariant learning algorithm for Gaussian processes (DIL-GP) with a min-max optimization on the likelihood. DIL-GP discovers the heterogeneity in the data and forces invariance across partitioned subsets of data. We further extend the DIL-GP to improve Bayesian optimization's adaptability on changing environments. Numerical experiments demonstrate the superiority of DIL-GP for predictions on several synthetic and real-world datasets. We further demonstrate the effectiveness of the DIL-GP Bayesian optimization method on a PID parameters tuning experiment for a quadrotor. The full version and source code are available at: https://github.com/Billzxl/DIL-GP.

APA, Harvard, Vancouver, ISO, and other styles

13

Zou, Xin, and Weiwei Liu. "Coverage-Guaranteed Prediction Sets for Out-of-Distribution Data." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 17263–70. http://dx.doi.org/10.1609/aaai.v38i15.29673.

Full text

Abstract:

Out-of-distribution (OOD) generalization has attracted increasing research attention in recent years, due to its promising experimental results in real-world applications. In this paper, we study the confidence set prediction problem in the OOD generalization setting. Split conformal prediction (SCP) is an efficient framework for handling the confidence set prediction problem. However, the validity of SCP requires the examples to be exchangeable, which is violated in the OOD setting. Empirically, we show that trivially applying SCP results in a failure to maintain the marginal coverage when the unseen target domain is different from the source domain. To address this issue, we develop a method for forming confident prediction sets in the OOD setting and theoretically prove the validity of our method. Finally, we conduct experiments on simulated data to empirically verify the correctness of our theory and the validity of our proposed method.

APA, Harvard, Vancouver, ISO, and other styles

14

Chen, Ziliang, Yongsen Zheng, Zhao-Rong Lai, Quanlong Guan, and Liang Lin. "Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 10 (March 24, 2024): 11471–79. http://dx.doi.org/10.1609/aaai.v38i10.29028.

Full text

Abstract:

Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels deconfounded from the environments, advancing the technical roadmap of out-of-distribution (OOD) generalization. Despite spotlights around, recent theoretical result verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains. The fake invariance severely endangers OOD generalization since the trustful objective can not be diagnosed and existing causal remedies are invalid to rectify. In this paper, we review a IRL family (InvRat) under the Partially and Fully Informative Invariant Feature Structural Causal Models (PIIF SCM /FIIF SCM) respectively, to certify their weaknesses in representing fake invariant features, then, unify their causal diagrams to propose ReStructured SCM (RS-SCM). RS-SCM can ideally rebuild the spurious and the fake invariant features simultaneously. Given this, we further develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects. It can be easily implemented by a small feature selection subnet introduced in the IRL family, which is alternatively optimized to achieve our goal. Experiments verified the superiority of our approach to fight against the fake invariant issue across a variety of OOD generalization benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

15

Li, Dasen, Zhendong Yin, Yanlong Zhao, Wudi Zhao, and Jiqing Li. "MLFAnet: A Tomato Disease Classification Method Focusing on OOD Generalization." Agriculture 13, no. 6 (May 29, 2023): 1140. http://dx.doi.org/10.3390/agriculture13061140.

Full text

Abstract:

Tomato disease classification based on images of leaves has received wide attention recently. As one of the best tomato disease classification methods, the convolutional neural network (CNN) has an immense impact due to its impressive performance. However, better performance is verified by independent identical distribution (IID) samples of tomato disease, which breaks down dramatically on out-of-distribution (OOD) classification tasks. In this paper, we investigated the corruption shifts, which was a vital component of OOD, and proposed a tomato disease classification method to improve the performance of corruption shift generalization. We first adopted discrete cosine transform (DCT) to obtain the low-frequency components. Then, the weight of the feature map was calculated by multiple low-frequency components, in order to reduce the influence of high-frequency variation caused by corrupted perturbation. The proposed method, termed as a multiple low-frequency attention network (MLFAnet), was verified by the benchmarking of ImageNet-C. The accuracy result and generalization performance confirmed the effectiveness of MLFAnet. The satisfactory generalization performance of our proposed classification method provides a reliable tool for the diagnosis of tomato disease.

APA, Harvard, Vancouver, ISO, and other styles

16

Ren, Yifei, and Pouya Bashivan. "How well do models of visual cortex generalize to out of distribution samples?" PLOS Computational Biology 20, no. 5 (May 31, 2024): e1011145. http://dx.doi.org/10.1371/journal.pcbi.1011145.

Full text

Abstract:

Unit activity in particular deep neural networks (DNNs) are remarkably similar to the neuronal population responses to static images along the primate ventral visual cortex. Linear combinations of DNN unit activities are widely used to build predictive models of neuronal activity in the visual cortex. Nevertheless, prediction performance in these models is often investigated on stimulus sets consisting of everyday objects under naturalistic settings. Recent work has revealed a generalization gap in how predicting neuronal responses to synthetically generated out-of-distribution (OOD) stimuli. Here, we investigated how the recent progress in improving DNNs’ object recognition generalization, as well as various DNN design choices such as architecture, learning algorithm, and datasets have impacted the generalization gap in neural predictivity. We came to a surprising conclusion that the performance on none of the common computer vision OOD object recognition benchmarks is predictive of OOD neural predictivity performance. Furthermore, we found that adversarially robust models often yield substantially higher generalization in neural predictivity, although the degree of robustness itself was not predictive of neural predictivity score. These results suggest that improving object recognition behavior on current benchmarks alone may not lead to more general models of neurons in the primate ventral visual cortex.

APA, Harvard, Vancouver, ISO, and other styles

17

Bento, Nuno, Joana Rebelo, André V. Carreiro, François Ravache, and Marília Barandas. "Exploring Regularization Methods for Domain Generalization in Accelerometer-Based Human Activity Recognition." Sensors 23, no. 14 (July 19, 2023): 6511. http://dx.doi.org/10.3390/s23146511.

Full text

Abstract:

The study of Domain Generalization (DG) has gained considerable momentum in the Machine Learning (ML) field. Human Activity Recognition (HAR) inherently encompasses diverse domains (e.g., users, devices, or datasets), rendering it an ideal testbed for exploring Domain Generalization. Building upon recent work, this paper investigates the application of regularization methods to bridge the generalization gap between traditional models based on handcrafted features and deep neural networks. We apply various regularizers, including sparse training, Mixup, Distributionally Robust Optimization (DRO), and Sharpness-Aware Minimization (SAM), to deep learning models and assess their performance in Out-of-Distribution (OOD) settings across multiple domains using homogenized public datasets. Our results show that Mixup and SAM are the best-performing regularizers. However, they are unable to match the performance of models based on handcrafted features. This suggests that while regularization techniques can improve OOD robustness to some extent, handcrafted features remain superior for domain generalization in HAR tasks.

APA, Harvard, Vancouver, ISO, and other styles

18

Xin, Shiji, Yifei Wang, Jingtong Su, and Yisen Wang. "On the Connection between Invariant Learning and Adversarial Training for Out-of-Distribution Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 10519–27. http://dx.doi.org/10.1609/aaai.v37i9.26250.

Full text

Abstract:

Despite impressive success in many tasks, deep learning models are shown to rely on spurious features, which will catastrophically fail when generalized to out-of-distribution (OOD) data. Invariant Risk Minimization (IRM) is proposed to alleviate this issue by extracting domain-invariant features for OOD generalization. Nevertheless, recent work shows that IRM is only effective for a certain type of distribution shift (e.g., correlation shift) while it fails for other cases (e.g., diversity shift). Meanwhile, another thread of method, Adversarial Training (AT), has shown better domain transfer performance, suggesting that it has the potential to be an effective candidate for extracting domain-invariant features. This paper investigates this possibility by exploring the similarity between the IRM and AT objectives. Inspired by this connection, we propose Domain-wise Adversarial Training (DAT), an AT-inspired method for alleviating distribution shift by domain-specific perturbations. Extensive experiments show that our proposed DAT can effectively remove domain-varying features and improve OOD generalization under both correlation shift and diversity shift.

APA, Harvard, Vancouver, ISO, and other styles

19

Zhai, Yuanzhao, Yiying Li, Zijian Gao, Xudong Gong, Kele Xu, Dawei Feng, Ding Bo, and Huaimin Wang. "Optimistic Model Rollouts for Pessimistic Offline Policy Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 16678–86. http://dx.doi.org/10.1609/aaai.v38i15.29607.

Full text

Abstract:

Model-based offline reinforcement learning (RL) has made remarkable progress, offering a promising avenue for improving generalization with synthetic model rollouts. Existing works primarily focus on incorporating pessimism for policy optimization, usually via constructing a Pessimistic Markov Decision Process (P-MDP). However, the P-MDP discourages the policies from learning in out-of-distribution (OOD) regions beyond the support of offline datasets, which can under-utilize the generalization ability of dynamics models. In contrast, we propose constructing an Optimistic MDP (O-MDP). We initially observed the potential benefits of optimism brought by encouraging more OOD rollouts. Motivated by this observation, we present ORPO, a simple yet effective model-based offline RL framework. ORPO generates Optimistic model Rollouts for Pessimistic offline policy Optimization. Specifically, we train an optimistic rollout policy in the O-MDP to sample more OOD model rollouts. Then we relabel the sampled state-action pairs with penalized rewards, and optimize the output policy in the P-MDP. Theoretically, we demonstrate that the performance of policies trained with ORPO can be lower-bounded in linear MDPs. Experimental results show that our framework significantly outperforms P-MDP baselines by a margin of 30%, achieving state-of-the-art performance on the widely-used benchmark. Moreover, ORPO exhibits notable advantages in problems that require generalization.

APA, Harvard, Vancouver, ISO, and other styles

20

Dong, Qishi, Fengwei Zhou, Ning Kang, Chuanlong Xie, Shifeng Zhang, Jiawei Li, Heng Peng, and Zhenguo Li. "DAMix: Exploiting Deep Autoregressive Model Zoo for Improving Lossless Compression Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 4 (June 26, 2023): 4250–58. http://dx.doi.org/10.1609/aaai.v37i4.25543.

Full text

Abstract:

Deep generative models have demonstrated superior performance in lossless compression on identically distributed data. However, in real-world scenarios, data to be compressed are of various distributions and usually cannot be known in advance. Thus, commercially expected neural compression must have strong Out-of-Distribution (OoD) generalization capabilities. Compared with traditional compression methods, deep learning methods have intrinsic flaws for OoD generalization. In this work, we make the attempt to tackle this challenge via exploiting a zoo of Deep Autoregressive models (DAMix). We build a model zoo consisting of autoregressive models trained on data from diverse distributions. In the test phase, we select useful expert models by a simple model evaluation score and adaptively aggregate the predictions of selected models. By assuming the outputs from each expert model are biased in favor of their training distributions, a von Mises-Fisher based filter is proposed to recover the value of unbiased predictions that provides more accurate density estimations than a single model. We derive the posterior of unbiased predictions as well as concentration parameters in the filter, and a novel temporal Stein variational gradient descent for sequential data is proposed to adaptively update the posterior distributions. We evaluate DAMix on 22 image datasets, including in-distribution and OoD data, and demonstrate that making use of unbiased predictions has up to 45.6% improvement over the single model trained on ImageNet.

APA, Harvard, Vancouver, ISO, and other styles

21

Lavda, Frantzeska, and Alexandros Kalousis. "Semi-Supervised Variational Autoencoders for Out-of-Distribution Generation." Entropy 25, no. 12 (December 14, 2023): 1659. http://dx.doi.org/10.3390/e25121659.

Full text

Abstract:

Humans are able to quickly adapt to new situations, learn effectively with limited data, and create unique combinations of basic concepts. In contrast, generalizing out-of-distribution (OOD) data and achieving combinatorial generalizations are fundamental challenges for machine learning models. Moreover, obtaining high-quality labeled examples can be very time-consuming and expensive, particularly when specialized skills are required for labeling. To address these issues, we propose BtVAE, a method that utilizes conditional VAE models to achieve combinatorial generalization in certain scenarios and consequently to generate out-of-distribution (OOD) data in a semi-supervised manner. Unlike previous approaches that use new factors of variation during testing, our method uses only existing attributes from the training data but in ways that were not seen during training (e.g., small objects of a specific shape during training and large objects of the same shape during testing).

APA, Harvard, Vancouver, ISO, and other styles

22

Su, Hang, and Wei Wang. "Invariant Feature Learning Based on Causal Inference from Heterogeneous Environments." Mathematics 12, no. 5 (February 27, 2024): 696. http://dx.doi.org/10.3390/math12050696.

Full text

Abstract:

Causality has become a powerful tool for addressing the out-of-distribution (OOD) generalization problem, with the idea of invariant causal features across domains of interest. Most existing methods for learning invariant features are based on optimization, which typically fails to converge to the optimal solution. Therefore, obtaining the variables that cause the target outcome through a causal inference method is a more direct and effective method. This paper presents a new approach for invariant feature learning based on causal inference (IFCI). IFCI detects causal variables unaffected by the environment through the causal inference method. IFCI focuses on partial causal relationships to work efficiently even in the face of high-dimensional data. Our proposed causal inference method can accurately infer causal effects even when the treatment variable has more complex values. Our method can be viewed as a pretreatment of data to filter out variables whose distributions change between different environments, and it can then be combined with any learning method for classification and regression. The result of empirical studies shows that IFCI can detect and filter out environmental variables affected by the environment. After filtering out environmental variables, even a model with a simple structure and common loss function can have strong OOD generalization capability. Furthermore, we provide evidence to show that classifiers utilizing IFCI achieve higher accuracy in classification compared to existing OOD generalization algorithms.

APA, Harvard, Vancouver, ISO, and other styles

23

Jia, Tianrui, Haoyang Li, Cheng Yang, Tao Tao, and Chuan Shi. "Graph Invariant Learning with Subgraph Co-mixup for Out-of-Distribution Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (March 24, 2024): 8562–70. http://dx.doi.org/10.1609/aaai.v38i8.28700.

Full text

Abstract:

Graph neural networks (GNNs) have been demonstrated to perform well in graph representation learning, but always lacking in generalization capability when tackling out-of-distribution (OOD) data. Graph invariant learning methods, backed by the invariance principle among defined multiple environments, have shown effectiveness in dealing with this issue. However, existing methods heavily rely on well-predefined or accurately generated environment partitions, which are hard to be obtained in practice, leading to sub-optimal OOD generalization performances. In this paper, we propose a novel graph invariant learning method based on invariant and variant patterns co-mixup strategy, which is capable of jointly generating mixed multiple environments and capturing invariant patterns from the mixed graph data. Specifically, we first adopt a subgraph extractor to identify invariant subgraphs. Subsequently, we design one novel co-mixup strategy, i.e., jointly conducting environment mixup and invariant mixup. For the environment mixup, we mix the variant environment-related subgraphs so as to generate sufficiently diverse multiple environments, which is important to guarantee the quality of the graph invariant learning. For the invariant mixup, we mix the invariant subgraphs, further encouraging to capture invariant patterns behind graphs while getting rid of spurious correlations for OOD generalization. We demonstrate that the proposed environment mixup and invariant mixup can mutually promote each other. Extensive experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art under various distribution shifts.

APA, Harvard, Vancouver, ISO, and other styles

24

Deng, Bin, and Kui Jia. "Counterfactual Supervision-Based Information Bottleneck for Out-of-Distribution Generalization." Entropy 25, no. 2 (January 18, 2023): 193. http://dx.doi.org/10.3390/e25020193.

Full text

Abstract:

Learning invariant (causal) features for out-of-distribution (OOD) generalization have attracted extensive attention recently, and among the proposals, invariant risk minimization (IRM) is a notable solution. In spite of its theoretical promise for linear regression, the challenges of using IRM in linear classification problems remain. By introducing the information bottleneck (IB) principle into the learning of IRM, the IB-IRM approach has demonstrated its power to solve these challenges. In this paper, we further improve IB-IRM from two aspects. First, we show that the key assumption of support overlap of invariant features used in IB-IRM guarantees OOD generalization, and it is still possible to achieve the optimal solution without this assumption. Second, we illustrate two failure modes where IB-IRM (and IRM) could fail in learning the invariant features, and to address such failures, we propose a Counterfactual Supervision-based Information Bottleneck (CSIB) learning algorithm that recovers the invariant features. By requiring counterfactual inference, CSIB works even when accessing data from a single environment. Empirical experiments on several datasets verify our theoretical results.

APA, Harvard, Vancouver, ISO, and other styles

25

Ding, Kun, Haojian Zhang, Qiang Yu, Ying Wang, Shiming Xiang, and Chunhong Pan. "Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 2 (March 24, 2024): 1528–36. http://dx.doi.org/10.1609/aaai.v38i2.27918.

Full text

Abstract:

We propose a generalized method for boosting the generalization ability of pre-trained vision-language models (VLMs) while fine-tuning on downstream few-shot tasks. The idea is realized by exploiting out-of-distribution (OOD) detection to predict whether a sample belongs to a base distribution or a novel distribution and then using the score generated by a dedicated competition based scoring function to fuse the zero-shot and few-shot classifier. The fused classifier is dynamic, which will bias towards the zero-shot classifier if a sample is more likely from the distribution pre-trained on, leading to improved base-to-novel generalization ability. Our method is performed only in test stage, which is applicable to boost existing methods without time-consuming re-training. Extensive experiments show that even weak distribution detectors can still improve VLMs' generalization ability. Specifically, with the help of OOD detectors, the harmonic mean of CoOp and ProGrad increase by 2.6 and 1.5 percentage points over 11 recognition datasets in the base-to-novel setting.

APA, Harvard, Vancouver, ISO, and other styles

26

Chen, Zhe, Zhiquan Ding, Xiaoling Zhang, Xin Zhang, and Tianqi Qin. "Improving Out-of-Distribution Generalization in SAR Image Scene Classification with Limited Training Samples." Remote Sensing 15, no. 24 (December 17, 2023): 5761. http://dx.doi.org/10.3390/rs15245761.

Full text

Abstract:

For practical maritime SAR image classification tasks with special imaging platforms, scenes to be classified are often different from those in the training sets. The quantity and diversity of the available training data can also be extremely limited. This problem of out-of-distribution (OOD) generalization with limited training samples leads to a sharp drop in the performance of conventional deep learning algorithms. In this paper, a knowledge-guided neural network (KGNN) model is proposed to overcome these challenges. By analyzing the saliency features of various maritime SAR scenes, universal knowledge in descriptive sentences is summarized. A feature integration strategy is designed to assign the descriptive knowledge to the ResNet-18 backbone. Both the individual semantic information and the inherent relations of the entities in SAR images are addressed. The experimental results show that our KGNN method outperforms conventional deep learning models in OOD scenarios with varying training sample sizes and achieves higher robustness in handling distributional shifts caused by weather conditions, terrain type, and sensor characteristics. In addition, the KGNN model converges within many fewer epochs during training. The performance improvement indicates that the KGNN model learns representations guided by beneficial properties for ODD generalization with limited training samples.

APA, Harvard, Vancouver, ISO, and other styles

27

He, Rundong, Yue Yuan, Zhongyi Han, Fan Wang, Wan Su, Yilong Yin, Tongliang Liu, and Yongshun Gong. "Exploring Channel-Aware Typical Features for Out-of-Distribution Detection." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 11 (March 24, 2024): 12402–10. http://dx.doi.org/10.1609/aaai.v38i11.29132.

Full text

Abstract:

Detecting out-of-distribution (OOD) data is essential to ensure the reliability of machine learning models when deployed in real-world scenarios. Different from most previous test-time OOD detection methods that focus on designing OOD scores, we delve into the challenges in OOD detection from the perspective of typicality and regard the feature’s high-probability region as the feature’s typical set. However, the existing typical-feature-based OOD detection method implies an assumption: the proportion of typical feature sets for each channel is fixed. According to our experimental analysis, each channel contributes differently to OOD detection. Adopting a fixed proportion for all channels results in several channels losing too many typical features or incorporating too many abnormal features, resulting in low performance. Therefore, exploring the channel-aware typical features is crucial to better-separating ID and OOD data. Driven by this insight, we propose expLoring channel-Aware tyPical featureS (LAPS). Firstly, LAPS obtains the channel-aware typical set by calibrating the channel-level typical set with the global typical set from the mean and standard deviation. Then, LAPS rectifies the features into channel-aware typical sets to obtain channel-aware typical features. Finally, LAPS leverages the channel-aware typical features to calculate the energy score for OOD detection. Theoretical and visual analyses verify that LAPS achieves a better bias-variance trade-off. Experiments verify the effectiveness and generalization of LAPS under different architectures and OOD scores.

APA, Harvard, Vancouver, ISO, and other styles

28

Bento, Nuno, Joana Rebelo, Marília Barandas, André V. Carreiro, Andrea Campagner, Federico Cabitza, and Hugo Gamboa. "Comparing Handcrafted Features and Deep Neural Representations for Domain Generalization in Human Activity Recognition." Sensors 22, no. 19 (September 27, 2022): 7324. http://dx.doi.org/10.3390/s22197324.

Full text

Abstract:

Human Activity Recognition (HAR) has been studied extensively, yet current approaches are not capable of generalizing across different domains (i.e., subjects, devices, or datasets) with acceptable performance. This lack of generalization hinders the applicability of these models in real-world environments. As deep neural networks are becoming increasingly popular in recent work, there is a need for an explicit comparison between handcrafted and deep representations in Out-of-Distribution (OOD) settings. This paper compares both approaches in multiple domains using homogenized public datasets. First, we compare several metrics to validate three different OOD settings. In our main experiments, we then verify that even though deep learning initially outperforms models with handcrafted features, the situation is reversed as the distance from the training distribution increases. These findings support the hypothesis that handcrafted features may generalize better across specific domains.

APA, Harvard, Vancouver, ISO, and other styles

29

Jia, Mengzhao, Can Xie, and Liqiang Jing. "Debiasing Multimodal Sarcasm Detection with Contrastive Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (March 24, 2024): 18354–62. http://dx.doi.org/10.1609/aaai.v38i16.29795.

Full text

Abstract:

Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between textual words and labels, thereby significantly hindering the models' generalization capability. To address this problem, we define the task of out-of-distribution (OOD) multimodal sarcasm detection, which aims to evaluate models' generalizability when the word distribution is different in training and testing settings. Moreover, we propose a novel debiasing multimodal sarcasm detection framework with contrastive learning, which aims to mitigate the harmful effect of biased textual factors for robust OOD generalization. In particular, we first design counterfactual data augmentation to construct the positive samples with dissimilar word biases and negative samples with similar word biases. Subsequently, we devise an adapted debiasing contrastive learning mechanism to empower the model to learn robust task-relevant features and alleviate the adverse effect of biased words. Extensive experiments show the superiority of the proposed framework.

APA, Harvard, Vancouver, ISO, and other styles

30

Ji, Yuanfeng, Lu Zhang, Jiaxiang Wu, Bingzhe Wu, Lanqing Li, Long-Kai Huang, Tingyang Xu, et al. "DrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery – a Focus on Affinity Prediction Problems with Noise Annotations." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (June 26, 2023): 8023–31. http://dx.doi.org/10.1609/aaai.v37i7.25970.

Full text

Abstract:

AI-aided drug discovery (AIDD) is gaining popularity due to its potential to make the search for new pharmaceuticals faster, less expensive, and more effective. Despite its extensive use in numerous fields (e.g., ADMET prediction, virtual screening), little research has been conducted on the out-of-distribution (OOD) learning problem with noise. We present DrugOOD, a systematic OOD dataset curator and benchmark for AIDD. Particularly, we focus on the drug-target binding affinity prediction problem, which involves both macromolecule (protein target) and small-molecule (drug compound). DrugOOD offers an automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise level annotations, and rigorous benchmarking of SOTA OOD algorithms, as opposed to only providing fixed datasets. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for graph OOD learning problems. Extensive empirical studies have revealed a significant performance gap between in-distribution and out-of-distribution experiments, emphasizing the need for the development of more effective schemes that permit OOD generalization under noise for AIDD.

APA, Harvard, Vancouver, ISO, and other styles

31

Yu, Shujian. "The Analysis of Deep Neural Networks by Information Theory: From Explainability to Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (June 26, 2023): 15462. http://dx.doi.org/10.1609/aaai.v37i13.26829.

Full text

Abstract:

Despite their great success in many artificial intelligence tasks, deep neural networks (DNNs) still suffer from a few limitations, such as poor generalization behavior for out-of-distribution (OOD) data and the "black-box" nature. Information theory offers fresh insights to solve these challenges. In this short paper, we briefly review the recent developments in this area, and highlight our contributions.

APA, Harvard, Vancouver, ISO, and other styles

32

Kim, Segwang, Hyoungwook Nam, Joonyoung Kim, and Kyomin Jung. "Neural Sequence-to-grid Module for Learning Symbolic Rules." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (May 18, 2021): 8163–71. http://dx.doi.org/10.1609/aaai.v35i9.16994.

Full text

Abstract:

Logical reasoning tasks over symbols, such as learning arithmetic operations and computer program evaluations, have become challenges to deep learning. In particular, even state-of-the-art neural networks fail to achieve \textit{out-of-distribution} (OOD) generalization of symbolic reasoning tasks, whereas humans can easily extend learned symbolic rules. To resolve this difficulty, we propose a neural sequence-to-grid (seq2grid) module, an input preprocessor that automatically segments and aligns an input sequence into a grid. As our module outputs a grid via a novel differentiable mapping, any neural network structure taking a grid input, such as ResNet or TextCNN, can be jointly trained with our module in an end-to-end fashion. Extensive experiments show that neural networks having our module as an input preprocessor achieve OOD generalization on various arithmetic and algorithmic problems including number sequence prediction problems, algebraic word problems, and computer program evaluation problems while other state-of-the-art sequence transduction models cannot. Moreover, we verify that our module enhances TextCNN to solve the bAbI QA tasks without external memory.

APA, Harvard, Vancouver, ISO, and other styles

33

Ahmed, Faruk, and Aaron Courville. "Detecting Semantic Anomalies." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 3154–62. http://dx.doi.org/10.1609/aaai.v34i04.5712.

Full text

Abstract:

We critically appraise the recent interest in out-of-distribution (OOD) detection and question the practical relevance of existing benchmarks. While the currently prevalent trend is to consider different datasets as OOD, we argue that out-distributions of practical interest are ones where the distinction is semantic in nature for a specified context, and that evaluative tasks should reflect this more closely. Assuming a context of object recognition, we recommend a set of benchmarks, motivated by practical applications. We make progress on these benchmarks by exploring a multi-task learning based approach, showing that auxiliary objectives for improved semantic awareness result in improved semantic anomaly detection, with accompanying generalization benefits.

APA, Harvard, Vancouver, ISO, and other styles

34

Vasiliuk, Anton, Daria Frolova, Mikhail Belyaev, and Boris Shirokikh. "Limitations of Out-of-Distribution Detection in 3D Medical Image Segmentation." Journal of Imaging 9, no. 9 (September 18, 2023): 191. http://dx.doi.org/10.3390/jimaging9090191.

Full text

Abstract:

Deep learning models perform unreliably when the data come from a distribution different from the training one. In critical applications such as medical imaging, out-of-distribution (OOD) detection methods help to identify such data samples, preventing erroneous predictions. In this paper, we further investigate OOD detection effectiveness when applied to 3D medical image segmentation. We designed several OOD challenges representing clinically occurring cases and found that none of the methods achieved acceptable performance. Methods not dedicated to segmentation severely failed to perform in the designed setups; the best mean false-positive rate at a 95% true-positive rate (FPR) was 0.59. Segmentation-dedicated methods still achieved suboptimal performance, with the best mean FPR being 0.31 (lower is better). To indicate this suboptimality, we developed a simple method called Intensity Histogram Features (IHF), which performed comparably or better in the same challenges, with a mean FPR of 0.25. Our findings highlight the limitations of the existing OOD detection methods with 3D medical images and present a promising avenue for improving them. To facilitate research in this area, we release the designed challenges as a publicly available benchmark and formulate practical criteria to test the generalization of OOD detection beyond the suggested benchmark. We also propose IHF as a solid baseline to contest emerging methods.

APA, Harvard, Vancouver, ISO, and other styles

35

Hong, Yining, Qing Li, Ran Gong, Daniel Ciao, Siyuan Huang, and Song-Chun Zhu. "SMART: A Situation Model for Algebra Story Problems via Attributed Grammar." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 14 (May 18, 2021): 13009–17. http://dx.doi.org/10.1609/aaai.v35i14.17538.

Full text

Abstract:

Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability. Previous neural solvers of math word problems directly translate problem texts into equations, lacking an explicit interpretation of the situations, and often fail to handle more sophisticated situations. To address such limits of neural solvers, we introduce the concept of a situation model, which originates from psychology studies to represent the mental states of humans in problem-solving, and propose SMART, which adopts attributed grammar as the representation of situation models for algebra story problems. Specifically, we first train an information extraction module to extract nodes, attributes and relations from problem texts and then generate a parse graph based on a pre-defined attributed grammar. An iterative learning strategy is also proposed to further improve the performance of SMART. To study this task more rigorously, we carefully curate a new dataset named ASP6.6k. Experimental results on ASP6.6k show that the proposed model outperforms all previous neural solvers by a large margin, while preserving much better interpretability. To test these models' generalization capability, we also design an out-of-distribution (OOD) evaluation, in which problems are more complex than those in the training set. Our model exceeds state-of-the-art models by 17% in the OOD evaluation, demonstrating its superior generalization ability.

APA, Harvard, Vancouver, ISO, and other styles

36

Fischer, Ian. "The Conditional Entropy Bottleneck." Entropy 22, no. 9 (September 8, 2020): 999. http://dx.doi.org/10.3390/e22090999.

Full text

Abstract:

Much of the field of Machine Learning exhibits a prominent set of failure modes, including vulnerability to adversarial examples, poor out-of-distribution (OoD) detection, miscalibration, and willingness to memorize random labelings of datasets. We characterize these as failures of robust generalization, which extends the traditional measure of generalization as accuracy or related metrics on a held-out set. We hypothesize that these failures to robustly generalize are due to the learning systems retaining too much information about the training data. To test this hypothesis, we propose the Minimum Necessary Information (MNI) criterion for evaluating the quality of a model. In order to train models that perform well with respect to the MNI criterion, we present a new objective function, the Conditional Entropy Bottleneck (CEB), which is closely related to the Information Bottleneck (IB). We experimentally test our hypothesis by comparing the performance of CEB models with deterministic models and Variational Information Bottleneck (VIB) models on a variety of different datasets and robustness challenges. We find strong empirical evidence supporting our hypothesis that MNI models improve on these problems of robust generalization.

APA, Harvard, Vancouver, ISO, and other styles

37

Moon, Seung Jun, Sangwoo Mo, Kimin Lee, Jaeho Lee, and Jinwoo Shin. "MASKER: Masked Keyword Regularization for Reliable Text Classification." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 15 (May 18, 2021): 13578–86. http://dx.doi.org/10.1609/aaai.v35i15.17601.

Full text

Abstract:

Pre-trained language models have achieved state-of-the-art accuracies on various text classification tasks, e.g., sentiment analysis, natural language inference, and semantic textual similarity. However, the reliability of the fine-tuned text classifiers is an often underlooked performance criterion. For instance, one may desire a model that can detect out-of-distribution (OOD) samples (drawn far from training distribution) or be robust against domain shifts. We claim that one central obstacle to the reliability is the over-reliance of the model on a limited number of keywords, instead of looking at the whole context. In particular, we find that (a) OOD samples often contain in-distribution keywords, while (b) cross-domain samples may not always contain keywords; over-relying on the keywords can be problematic for both cases. In light of this observation, we propose a simple yet effective fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction. MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context. When applied to various pre-trained language models (e.g., BERT, RoBERTa, and ALBERT), we demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy. Code is available at https://github.com/alinlab/MASKER.

APA, Harvard, Vancouver, ISO, and other styles

38

Chen, Muyi, Daling Wang, Shi Feng, and Yifei Zhang. "Denoising in Representation Space via Data-Dependent Regularization for Better Representation." Mathematics 11, no. 10 (May 16, 2023): 2327. http://dx.doi.org/10.3390/math11102327.

Full text

Abstract:

Despite the success of deep learning models, it remains challenging for the over-parameterized model to learn good representation under small-sample-size settings. In this paper, motivated by previous work on out-of-distribution (OoD) generalization, we study the representation learning problem from an OoD perspective to identify the fundamental factors affecting representation quality. We formulate a notion of “out-of-feature subspace (OoFS) noise” for the first time, and we link the OoFS noise in the feature extractor to the OoD performance of the model by proving two theorems that demonstrate that reducing OoFS noise in the feature extractor is beneficial in achieving better representation. Moreover, we identify two causes of OoFS noise and prove that the OoFS noise induced by random initialization can be filtered out via L2 regularization. Finally, we propose a novel data-dependent regularizer that acts on the weights of the fully connected layer to reduce noise in the representations, thus implicitly forcing the feature extractor to focus on informative features and to rely less on noise via back-propagation. Experiments on synthetic datasets show that our method can learn hard-to-learn features; can filter out noise effectively; and outperforms GD, AdaGrad, and KFAC. Furthermore, experiments on the benchmark datasets show that our method achieves the best performance for three tasks among four.

APA, Harvard, Vancouver, ISO, and other styles

39

Chen, Zhengyu, Teng Xiao, Kun Kuang, Zheqi Lv, Min Zhang, Jinluan Yang, Chengqiang Lu, Hongxia Yang, and Fei Wu. "Learning to Reweight for Generalizable Graph Neural Network." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (March 24, 2024): 8320–28. http://dx.doi.org/10.1609/aaai.v38i8.28673.

Full text

Abstract:

Graph Neural Networks (GNNs) show promising results for graph tasks. However, existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data. The fundamental reason for the severe degeneration is that most GNNs are designed based on the I.I.D hypothesis. In such a setting, GNNs tend to exploit subtle statistical correlations existing in the training set for predictions, even though it is a spurious correlation. In this paper, we study the problem of the generalization ability of GNNs on Out-Of-Distribution (OOD) settings. To solve this problem, we propose the Learning to Reweight for Generalizable Graph Neural Network (L2R-GNN) to enhance the generalization ability for achieving satisfactory performance on unseen testing graphs that have different distributions with training graphs. We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability and compares favorably to previous methods in restraining the over-reduced sample size. The variables of graph representation are clustered based on the stability of their correlations, and graph decorrelation method learns weights to remove correlations between the variables of different clusters rather than any two variables. Besides, we introduce an effective stochastic algorithm based on bi-level optimization for the L2R-GNN framework, which enables simultaneously learning the optimal weights and GNN parameters, and avoids the over-fitting issue. Experiments show that L2R-GNN greatly outperforms baselines on various graph prediction benchmarks under distribution shifts.

APA, Harvard, Vancouver, ISO, and other styles

40

Wu, Fan, Jinling Gao, Lanqing Hong, Xinbing Wang, Chenghu Zhou, and Nanyang Ye. "G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 6 (March 24, 2024): 5958–66. http://dx.doi.org/10.1609/aaai.v38i6.28410.

Full text

Abstract:

In this paper, we focus on a realistic yet challenging task, Single Domain Generalization Object Detection (S-DGOD), where only one source domain's data can be used for training object detectors, but have to generalize multiple distinct target domains. In S-DGOD, both high-capacity fitting and generalization abilities are needed due to the task's complexity. Differentiable Neural Architecture Search (NAS) is known for its high capacity for complex data fitting and we propose to leverage Differentiable NAS to solve S-DGOD. However, it may confront severe over-fitting issues due to the feature imbalance phenomenon, where parameters optimized by gradient descent are biased to learn from the easy-to-learn features, which are usually non-causal and spuriously correlated to ground truth labels, such as the features of background in object detection data. Consequently, this leads to serious performance degradation, especially in generalizing to unseen target domains with huge domain gaps between the source domain and target domains. To address this issue, we propose the Generalizable loss (G-loss), which is an OoD-aware objective, preventing NAS from over-fitting by using gradient descent to optimize parameters not only on a subset of easy-to-learn features but also the remaining predictive features for generalization, and the overall framework is named G-NAS. Experimental results on the S-DGOD urban-scene datasets demonstrate that the proposed G-NAS achieves SOTA performance compared to baseline methods. Codes are available at https://github.com/wufan-cse/G-NAS.

APA, Harvard, Vancouver, ISO, and other styles

41

Zhang, Yu, Rongjie Huang, Ruiqi Li, JinZheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, and Zhou Zhao. "StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 19597–605. http://dx.doi.org/10.1609/aaai.v38i17.29932.

Full text

Abstract:

Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expressiveness. Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase. To overcome these challenges, we propose StyleSinger, the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. StyleSinger incorporates two critical approaches for enhanced effectiveness: 1) the Residual Style Adaptor (RSA) which employs a residual quantization module to capture diverse style characteristics in singing voices, and 2) the Uncertainty Modeling Layer Normalization (UMLN) to perturb the style attributes within the content representation during the training phase and thus improve the model generalization. Our extensive evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples. Access to singing voice samples can be found at https://stylesinger.github.io/.

APA, Harvard, Vancouver, ISO, and other styles

42

Ramachandran, Sai Niranjan, Rudrabha Mukhopadhyay, Madhav Agarwal, C. V. Jawahar, and Vinay Namboodiri. "Understanding the Generalization of Pretrained Diffusion Models on Out-of-Distribution Data." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 13 (March 24, 2024): 14767–75. http://dx.doi.org/10.1609/aaai.v38i13.29395.

Full text

Abstract:

This work tackles the important task of understanding out-of-distribution behavior in two prominent types of generative models, i.e., GANs and Diffusion models. Understanding this behavior is crucial in understanding their broader utility and risks as these systems are increasingly deployed in our daily lives. Our first contribution is demonstrating that diffusion spaces outperform GANs' latent spaces in inverting high-quality OOD images. We also provide a theoretical analysis attributing this to the lack of prior holes in diffusion spaces. Our second significant contribution is to provide a theoretical hypothesis that diffusion spaces can be projected onto a bounded hypersphere, enabling image manipulation through geodesic traversal between inverted images. Our analysis shows that different geodesics share common attributes for the same manipulation, which we leverage to perform various image manipulations. We conduct thorough empirical evaluations to support and validate our claims. Finally, our third and final contribution introduces a novel approach to the few-shot sampling for out-of-distribution data by inverting a few images to sample from the cluster formed by the inverted latents. The proposed technique achieves state-of-the-art results for the few-shot generation task in terms of image quality. Our research underscores the promise of diffusion spaces in out-of-distribution imaging and offers avenues for further exploration. Please find more details about our project at \url{http://cvit.iiit.ac.in/research/projects/cvit-projects/diffusionOOD}

APA, Harvard, Vancouver, ISO, and other styles

43

Sinha, Samarth, Homanga Bharadhwaj, Anirudh Goyal, Hugo Larochelle, Animesh Garg, and Florian Shkurti. "DIBS: Diversity Inducing Information Bottleneck in Model Ensembles." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (May 18, 2021): 9666–74. http://dx.doi.org/10.1609/aaai.v35i11.17163.

Full text

Abstract:

Although deep learning models have achieved state-of-the art performance on a number of vision tasks, generalization over high dimensional multi-modal data, and reliable predictive uncertainty estimation are still active areas of research. Bayesian approaches including Bayesian Neural Nets (BNNs) do not scale well to modern computer vision tasks, as they are difficult to train, and have poor generalization under dataset-shift. This motivates the need for effective ensembles which can generalize and give reliable uncertainty estimates. In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning the stochastic latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. We evaluate our method on benchmark datasets: MNIST, CIFAR100, TinyImageNet and MIT Places 2, and compared to the most competitive baselines show significant improvements in classification accuracy, under a shift in the data distribution and in out-of-distribution detection. over 10% relative improvement in classification accuracy, over 5% relative improvement in generalizing under dataset shift, and over 5% better predictive uncertainty estimation as inferred by efficient out-of-distribution (OOD) detection.

APA, Harvard, Vancouver, ISO, and other styles

44

Xie, Yi, Jie Zhang, Shiqian Zhao, Tianwei Zhang, and Xiaofeng Chen. "SAME: Sample Reconstruction against Model Extraction Attacks." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 18 (March 24, 2024): 19974–82. http://dx.doi.org/10.1609/aaai.v38i18.29974.

Full text

Abstract:

While deep learning models have shown significant performance across various domains, their deployment needs extensive resources and advanced computing infrastructure. As a solution, Machine Learning as a Service (MLaaS) has emerged, lowering the barriers for users to release or productize their deep learning models. However, previous studies have highlighted potential privacy and security concerns associated with MLaaS, and one primary threat is model extraction attacks. To address this, there are many defense solutions but they suffer from unrealistic assumptions and generalization issues, making them less practical for reliable protection. Driven by these limitations, we introduce a novel defense mechanism, SAME, based on the concept of sample reconstruction. This strategy imposes minimal prerequisites on the defender's capabilities, eliminating the need for auxiliary Out-of-Distribution (OOD) datasets, user query history, white-box model access, and additional intervention during model training. It is compatible with existing active defense methods. Our extensive experiments corroborate the superior efficacy of SAME over state-of-the-art solutions. Our code is available at https://github.com/xythink/SAME.

APA, Harvard, Vancouver, ISO, and other styles

45

Cai, Tian, Li Xie, Shuo Zhang, Muge Chen, Di He, Amitesh Badkul, Yang Liu, et al. "End-to-end sequence-structure-function meta-learning predicts genome-wide chemical-protein interactions for dark proteins." PLOS Computational Biology 19, no. 1 (January 18, 2023): e1010851. http://dx.doi.org/10.1371/journal.pcbi.1010851.

Full text

Abstract:

Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain “dark”—i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components:(i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space.

APA, Harvard, Vancouver, ISO, and other styles

46

Acikgoz, Mehmet, Serkan Araci, and Ugur Duran. "Some (p, q)-analogues of Apostol type numbers and polynomials." Acta et Commentationes Universitatis Tartuensis de Mathematica 23, no. 1 (August 9, 2019): 37–50. http://dx.doi.org/10.12697/acutm.2019.23.04.

Full text

Abstract:

We consider a new class of generating functions of the generalizations of Bernoulli and Euler polynomials in terms of (p, q)-integers. By making use of these generating functions, we derive (p, q)-generalizations of several old and new identities concerning Apostol–Bernoulli and Apostol–Euler polynomials. Finally, we define the (p, q)-generalization of Stirling polynomials of the second kind of order v, and provide a link between the (p, q)-generalization of Bernoulli polynomials of order v and the (p, q)-generalization of Stirling polynomials of the second kind of order v.

APA, Harvard, Vancouver, ISO, and other styles

47

Kuroda, Masamichi. "Monomial Generalized Almost Perfect Nonlinear Functions." International Journal of Foundations of Computer Science 31, no. 03 (April 2020): 411–19. http://dx.doi.org/10.1142/s0129054120500161.

Full text

Abstract:

Generalized almost perfect nonlinear (GAPN) functions were defined to satisfy some generalizations of basic properties of almost perfect nonlinear (APN) functions for even characteristic. In particular, on finite fields of even characteristic, GAPN functions coincide with APN functions. In this paper, we study monomial GAPN functions for odd characteristic. We give monomial GAPN functions whose algebraic degree are maximum or minimum on a finite field of odd characteristic. Moreover, we define a generalization of exceptional APN functions and give typical examples.

APA, Harvard, Vancouver, ISO, and other styles

48

Bozin, Vladimir, and Miodrag Mateljevic. "Bounds for Jacobian of harmonic injective mappings in n-dimensional space." Filomat 29, no. 9 (2015): 2119–24. http://dx.doi.org/10.2298/fil1509119b.

Full text

Abstract:

Using normal family arguments, we show that the degree of the first nonzero homogenous polynomial in the expansion of n dimensional Euclidean harmonic K-quasiconformal mapping around an internal point is odd, and that such a map from the unit ball onto a bounded convex domain, with K < 3n-1, is co-Lipschitz. Also some generalizations of this result are given, as well as a generalization of Heinz?s lemma for harmonic quasiconformal maps in Rn and related results.

APA, Harvard, Vancouver, ISO, and other styles

49

Weinberger, David. "The Rise of Particulars: AI and the Ethics of Care." Philosophies 9, no. 1 (February 16, 2024): 26. http://dx.doi.org/10.3390/philosophies9010026.

Full text

Abstract:

Machine learning (ML) trains itself by discovering patterns of correlations that can be applied to new inputs. That is a very powerful form of generalization, but it is also very different from the sort of generalization that the west has valorized as the highest form of truth, such as universal laws in some of the sciences, or ethical principles and frameworks in moral reasoning. Machine learning’s generalizations synthesize the general and the particular in a new way, creating a multidimensional model that often retains more of the complex differentiating patterns it has uncovered in the training process than the human mind can grasp. Particulars speak louder in these models than they do in traditional generalizing frameworks. This creates an odd analogy with recent movements in moral philosophy, particularly the feminist ethics of care which reject the application of general moral frameworks in favor of caring responses to the particular needs and interests of those affected by a moral decision. This paper suggests that our current wide-spread and justified worries about ML’s inexplicability—primarily arising from its reliance on staggeringly complex patterns of particulars—may be preparing our culture more broadly for a valorizing of particulars as at least as determinative as generalizations, and that this might help further advance the importance of particulars in ideas such as those put forward by the ethics of care.

APA, Harvard, Vancouver, ISO, and other styles

50

Crone, Lawrence J. "A Generalization of Odd and Even Functions." Mathematics Magazine 76, no. 4 (October 1, 2003): 308. http://dx.doi.org/10.2307/3219090.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!