Статті в журналах з теми "Self-Supervised models"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Self-Supervised models.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Self-Supervised models".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Anton, Jonah, Liam Castelli, Mun Fai Chan, Mathilde Outters, Wan Hee Tang, Venus Cheung, Pancham Shukla, Rahee Walambe, and Ketan Kotecha. "How Well Do Self-Supervised Models Transfer to Medical Imaging?" Journal of Imaging 8, no. 12 (December 1, 2022): 320. http://dx.doi.org/10.3390/jimaging8120320.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Self-supervised learning approaches have seen success transferring between similar medical imaging datasets, however there has been no large scale attempt to compare the transferability of self-supervised models against each other on medical images. In this study, we compare the generalisability of seven self-supervised models, two of which were trained in-domain, against supervised baselines across nine different medical datasets. We find that ImageNet pretrained self-supervised models are more generalisable than their supervised counterparts, scoring up to 10% better on medical classification tasks. The two in-domain pretrained models outperformed other models by over 20% on in-domain tasks, however they suffered significant loss of accuracy on all other tasks. Our investigation of the feature representations suggests that this trend may be due to the models learning to focus too heavily on specific areas.
2

Gatopoulos, Ioannis, and Jakub M. Tomczak. "Self-Supervised Variational Auto-Encoders." Entropy 23, no. 6 (June 14, 2021): 747. http://dx.doi.org/10.3390/e23060747.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Density estimation, compression, and data generation are crucial tasks in artificial intelligence. Variational Auto-Encoders (VAEs) constitute a single framework to achieve these goals. Here, we present a novel class of generative models, called self-supervised Variational Auto-Encoder (selfVAE), which utilizes deterministic and discrete transformations of data. This class of models allows both conditional and unconditional sampling while simplifying the objective function. First, we use a single self-supervised transformation as a latent variable, where the transformation is either downscaling or edge detection. Next, we consider a hierarchical architecture, i.e., multiple transformations, and we show its benefits compared to the VAE. The flexibility of selfVAE in data reconstruction finds a particularly interesting use case in data compression tasks, where we can trade-off memory for better data quality and vice-versa. We present the performance of our approach on three benchmark image data (Cifar10, Imagenette64, and CelebA).
3

Zhang, Ronghua, Yuanyuan Wang, Fangyuan Liu, Changzheng Liu, Yaping Song, and Baohua Yu. "S2NMF: Information Self-Enhancement Self-Supervised Nonnegative Matrix Factorization for Recommendation." Wireless Communications and Mobile Computing 2022 (August 30, 2022): 1–10. http://dx.doi.org/10.1155/2022/4748858.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Nonnegative matrix factorization (NMF), which is aimed at making all elements of the factorization nonnegative and achieving nonlinear dimensional reduction at the same time, is an effective method for solving recommendation system problems. However, in many real-world applications, most models learn recommendation models under the supervised learning paradigm. Since the recommendation performance of NMF models relies heavily on initialization, the user-item interaction information is often very sparse. In many cases, supervised information about the data is difficult to obtain, resulting in a large number of existing models for supervised learning being inapplicable. To address this problem, we propose an information self-supervised NMF model for recommendation. Specifically, this model is based on the matrix factorization idea and introduces a self-supervised learning mechanism based on the NMF model to enhance the sparse data information of sparse data, and an easily extensible self-supervised NMF model was proposed. Furthermore, a corresponding gradient descent optimization algorithm was proposed, and the complexity of the algorithm was analysed. A large number of experimental results show that the proposed S2NMF has better performance.
4

Dang, Thanh-Vu, JinYoung Kim, Gwang-Hyun Yu, Ji Yong Kim, Young Hwan Park, and ChilWoo Lee. "Korean Text to Gloss: Self-Supervised Learning approach." Korean Institute of Smart Media 12, no. 1 (February 28, 2023): 32–46. http://dx.doi.org/10.30693/smj.2023.12.1.32.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Natural Language Processing (NLP) has grown tremendously in recent years. Typically, bilingual, and multilingual translation models have been deployed widely in machine translation and gained vast attention from the research community. On the contrary, few studies have focused on translating between spoken and sign languages, especially non-English languages. Prior works on Sign Language Translation (SLT) have shown that a mid-level sign gloss representation enhances translation performance. Therefore, this study presents a new large-scale Korean sign language dataset, the Museum-Commentary Korean Sign Gloss (MCKSG) dataset, including 3828 pairs of Korean sentences and their corresponding sign glosses used in Museum-Commentary contexts. In addition, we propose a translation framework based on self-supervised learning, where the pretext task is a text-to-text from a Korean sentence to its back-translation versions, then the pre-trained network will be fine-tuned on the MCKSG dataset. Using self-supervised learning help to overcome the drawback of a shortage of sign language data. Through experimental results, our proposed model outperforms a baseline BERT model by 6.22%.
5

Risojević, V., and V. Stojnić. "DO WE STILL NEED IMAGENET PRE-TRAINING IN REMOTE SENSING SCENE CLASSIFICATION?" International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B3-2022 (May 31, 2022): 1399–406. http://dx.doi.org/10.5194/isprs-archives-xliii-b3-2022-1399-2022.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract. Due to the scarcity of labeled data, using supervised models pre-trained on ImageNet is a de facto standard in remote sensing scene classification. Recently, the availability of larger high resolution remote sensing (HRRS) image datasets and progress in self-supervised learning have brought up the questions of whether supervised ImageNet pre-training is still necessary for remote sensing scene classification and would supervised pre-training on HRRS image datasets or self-supervised pre-training on ImageNet achieve better results on target remote sensing scene classification tasks. To answer these questions, in this paper we both train models from scratch and fine-tune supervised and self-supervised ImageNet models on several HRRS image datasets. We also evaluate the transferability of learned representations to HRRS scene classification tasks and show that self-supervised pre-training outperforms the supervised one, while the performance of HRRS pre-training is similar to self-supervised pre-training or slightly lower. Finally, we propose using an ImageNet pre-trained model combined with a second round of pre-training using in-domain HRRS images, i.e. domain-adaptive pre-training. The experimental results show that domain-adaptive pre-training results in models that achieve state-of-the-art results on HRRS scene classification benchmarks. The source code and pre-trained models are available at https://github.com/risojevicv/RSSC-transfer.
6

Imran, Abdullah-Al-Zubaer, Chao Huang, Hui Tang, Wei Fan, Yuan Xiao, Dingjun Hao, Zhen Qian, and Demetri Terzopoulos. "Self-Supervised, Semi-Supervised, Multi-Context Learning for the Combined Classification and Segmentation of Medical Images (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (April 3, 2020): 13815–16. http://dx.doi.org/10.1609/aaai.v34i10.7179.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
To tackle the problem of limited annotated data, semi-supervised learning is attracting attention as an alternative to fully supervised models. Moreover, optimizing a multiple-task model to learn “multiple contexts” can provide better generalizability compared to single-task models. We propose a novel semi-supervised multiple-task model leveraging self-supervision and adversarial training—namely, self-supervised, semi-supervised, multi-context learning (S4MCL)—and apply it to two crucial medical imaging tasks, classification and segmentation. Our experiments on spine X-rays reveal that the S4MCL model significantly outperforms semi-supervised single-task, semi-supervised multi-context, and fully-supervised single-task models, even with a 50% reduction of classification and segmentation labels.
7

Zhou, Meng, Zechen Li, and Pengtao Xie. "Self-supervised Regularization for Text Classification." Transactions of the Association for Computational Linguistics 9 (2021): 641–56. http://dx.doi.org/10.1162/tacl_a_00389.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) is an unsupervised learning approach that defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any human- provided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/UCSD-AI4H/SSReg.
8

Gong, Yuan, Cheng-I. Lai, Yu-An Chung, and James Glass. "SSAST: Self-Supervised Audio Spectrogram Transformer." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10699–709. http://dx.doi.org/10.1609/aaai.v36i10.21315.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Recently, neural networks based purely on self-attention, such as the Vision Transformer (ViT), have been shown to outperform deep learning models constructed with convolutional neural networks (CNNs) on various vision tasks, thus extending the success of Transformers, which were originally developed for language processing, to the vision domain. A recent study showed that a similar methodology can also be applied to the audio domain. Specifically, the Audio Spectrogram Transformer (AST) achieves state-of-the-art results on various audio classification benchmarks. However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST. This paper focuses on audio and speech classification, and aims to reduce the need for large amounts of labeled data for the AST by leveraging self-supervised learning using unlabeled data. Specifically, we propose to pretrain the AST model with joint discriminative and generative masked spectrogram patch modeling (MSPM) using unlabeled audio from AudioSet and Librispeech. We evaluate our pretrained models on both audio and speech classification tasks including audio event classification, keyword spotting, emotion recognition, and speaker identification. The proposed self-supervised framework significantly boosts AST performance on all tasks, with an average improvement of 60.9%, leading to similar or even better results than a supervised pretrained AST. To the best of our knowledge, it is the first patch-based self-supervised learning framework in the audio and speech domain, and also the first self-supervised learning framework for AST.
9

Chen, Xuehao, Jin Zhou, Yuehui Chen, Shiyuan Han, Yingxu Wang, Tao Du, Cheng Yang, and Bowen Liu. "Self-Supervised Clustering Models Based on BYOL Network Structure." Electronics 12, no. 23 (November 21, 2023): 4723. http://dx.doi.org/10.3390/electronics12234723.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Contrastive-based clustering models usually rely on a large number of negative pairs to capture uniform representations, which requires a large batch size and high computational complexity. In contrast, some self-supervised methods perform non-contrastive learning to capture discriminative representations only with positive pairs, but suffer from the collapse of clustering. To solve these issues, a novel end-to-end self-supervised clustering model is proposed in this paper. The basic self-supervised learning network is first modified, followed by the incorporation of a Softmax layer to obtain cluster assignments as data representation. Then, adversarial learning on the cluster assignments is integrated into the methods to further enhance discrimination across different clusters and mitigate the collapse between clusters. To further encourage clustering-oriented guidance, a new cluster-level discrimination is assembled to promote clustering performance by measuring the self-correlation between the learned cluster assignments. Experimental results on real-world datasets exhibit better performance of the proposed model compared with the existing deep clustering methods.
10

Luo, Dezhao, Chang Liu, Yu Zhou, Dongbao Yang, Can Ma, Qixiang Ye, and Weiping Wang. "Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11701–8. http://dx.doi.org/10.1609/aaai.v34i07.6840.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates “blanks” by withholding video clips and then creates “options” by applying spatio-temporal operations on the withheld clips. Finally, it fills the blanks with “options” and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-of-the-art self-supervised models with significant margins.
11

Tuncal, Kubra, Boran Sekeroglu, and Rahib Abiyev. "Self-Supervised and Supervised Image Enhancement Networks with Time-Shift Module." Electronics 13, no. 12 (June 13, 2024): 2313. http://dx.doi.org/10.3390/electronics13122313.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Enhancing image quality provides more interpretability for both human beings and machines. Traditional image enhancement techniques work well for specific uses, but they struggle with images taken in extreme conditions, such as varied distortions, noise, and contrast deformations. Deep-learning-based methods produce superior quality in enhancing images since they are capable of learning the spatial characteristics within the images. However, deeper models increase the computational costs and require additional modules for particular problems. In this paper, we propose self-supervised and supervised image enhancement models based on the time-shift image enhancement method (TS-IEM). We embedded the TS-IEM into a four-layer CNN model and reconstructed the reference images for the self-supervised model. The reconstructed images are also used in the supervised model as an additional layer to improve the learning process and obtain better-quality images. Comprehensive experiments and qualitative and quantitative analysis are performed using three benchmark datasets of different application domains. The results showed that the self-supervised model could provide reasonable results for the datasets without reference images. On the other hand, the supervised model outperformed the state-of-the-art methods in quantitative analysis by producing well-enhanced images for different tasks.
12

Knoedler, Luzia, Chadi Salmi, Hai Zhu, Bruno Brito, and Javier Alonso-Mora. "Improving Pedestrian Prediction Models With Self-Supervised Continual Learning." IEEE Robotics and Automation Letters 7, no. 2 (April 2022): 4781–88. http://dx.doi.org/10.1109/lra.2022.3148475.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Pasad, Ankita, Chung-Ming Chien, Shane Settle, and Karen Livescu. "What Do Self-Supervised Speech Models Know About Words?" Transactions of the Association for Computational Linguistics 12 (2024): 372–91. http://dx.doi.org/10.1162/tacl_a_00656.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks. However, these empirical successes alone do not give a complete picture of what is learned during pre-training. Recent work has begun analyzing how S3Ms encode certain properties, such as phonetic and speaker information, but we still lack a proper understanding of knowledge encoded at the word level and beyond. In this work, we use lightweight analysis methods to study segment-level linguistic properties—word identity, boundaries, pronunciation, syntactic features, and semantic features—encoded in S3Ms. We present a comparative study of layer-wise representations from ten S3Ms and find that (i) the frame-level representations within each word segment are not all equally informative, and (ii) the pre-training objective and model size heavily influence the accessibility and distribution of linguistic information across layers. We also find that on several tasks—word discrimination, word segmentation, and semantic sentence similarity—S3Ms trained with visual grounding outperform their speech-only counterparts. Finally, our task-based analyses demonstrate improved performance on word segmentation and acoustic word discrimination while using simpler methods than prior work.1
14

Li, Jingwei, Chi Zhang, Linyuan Wang, Penghui Ding, Lulu Hu, Bin Yan, and Li Tong. "A Visual Encoding Model Based on Contrastive Self-Supervised Learning for Human Brain Activity along the Ventral Visual Stream." Brain Sciences 11, no. 8 (July 29, 2021): 1004. http://dx.doi.org/10.3390/brainsci11081004.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Visual encoding models are important computational models for understanding how information is processed along the visual stream. Many improved visual encoding models have been developed from the perspective of the model architecture and the learning objective, but these are limited to the supervised learning method. From the view of unsupervised learning mechanisms, this paper utilized a pre-trained neural network to construct a visual encoding model based on contrastive self-supervised learning for the ventral visual stream measured by functional magnetic resonance imaging (fMRI). We first extracted features using the ResNet50 model pre-trained in contrastive self-supervised learning (ResNet50-CSL model), trained a linear regression model for each voxel, and finally calculated the prediction accuracy of different voxels. Compared with the ResNet50 model pre-trained in a supervised classification task, the ResNet50-CSL model achieved an equal or even relatively better encoding performance in multiple visual cortical areas. Moreover, the ResNet50-CSL model performs hierarchical representation of input visual stimuli, which is similar to the human visual cortex in its hierarchical information processing. Our experimental results suggest that the encoding model based on contrastive self-supervised learning is a strong computational model to compete with supervised models, and contrastive self-supervised learning proves an effective learning method to extract human brain-like representations.
15

Scheibenreif, L., M. Mommert, and D. Borth. "CONTRASTIVE SELF-SUPERVISED DATA FUSION FOR SATELLITE IMAGERY." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences V-3-2022 (May 17, 2022): 705–11. http://dx.doi.org/10.5194/isprs-annals-v-3-2022-705-2022.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract. Self-supervised learning has great potential for the remote sensing domain, where unlabelled observations are abundant, but labels are hard to obtain. This work leverages unlabelled multi-modal remote sensing data for augmentation-free contrastive self-supervised learning. Deep neural network models are trained to maximize the similarity of latent representations obtained with different sensing techniques from the same location, while distinguishing them from other locations. We showcase this idea with two self-supervised data fusion methods and compare against standard supervised and self-supervised learning approaches on a land-cover classification task. Our results show that contrastive data fusion is a powerful self-supervised technique to train image encoders that are capable of producing meaningful representations: Simple linear probing performs on par with fully supervised approaches and fine-tuning with as little as 10% of the labelled data results in higher accuracy than supervised training on the entire dataset.
16

Yuan, Guotao, Hong Huang, and Xin Li. "Self-supervised learning backdoor defense mixed with self-attention mechanism." Journal of Computing and Electronic Information Management 12, no. 2 (March 30, 2024): 81–88. http://dx.doi.org/10.54097/7hx9afkw.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Recent studies have shown that Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, where attackers embed hidden backdoors into the DNN models by poisoning a small number of training samples. The attacked models perform normally on benign samples, but when the backdoor is activated, their prediction results will be maliciously altered. To address the issues of suboptimal backdoor defense effectiveness and limited generality, a hybrid self-attention mechanism-based self-supervised learning method for backdoor defense is proposed. This method defends against backdoor attacks by leveraging the attack characteristics of backdoor threats, aiming to mitigate their impact. It adopts a decoupling approach, disconnecting the association between poisoned samples and target labels, and enhances the connection between feature labels and clean labels by optimizing the feature extractor. Experimental results on CIFAR-10 and CIFAR-100 datasets show that this method performs moderately in terms of Clean Accuracy (CA), ranking at the median level. However, it achieves significant effectiveness in reducing the Attack Success Rate (ASR), especially against BadNets and Blended attacks, where its defense capability is notably superior to other methods, with attack success rates below 2%.
17

Zhang, Ye. "Application of self-supervised learning in natural language processing." Journal of Computing and Electronic Information Management 12, no. 1 (February 28, 2024): 23–26. http://dx.doi.org/10.54097/urpv6i8g3j.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Self-supervised learning uses the label-free data learning model and has a significant impact on the NLP task. It reduces data annotation costs and improves performance. The main applications include pre-training models such as BERT and GPT, contrast learning, and pseudo-supervised and semi-supervised methods. It has been successfully applied in text classification, emotion analysis and other fields. Future research directions include mixed unsupervised learning, cross-modal learning and improving interpretability of models while focusing on ethical social issues.
18

Dominic, Jeffrey, Nandita Bhaskhar, Arjun D. Desai, Andrew Schmidt, Elka Rubin, Beliz Gunel, Garry E. Gold, et al. "Improving Data-Efficiency and Robustness of Medical Imaging Segmentation Using Inpainting-Based Self-Supervised Learning." Bioengineering 10, no. 2 (February 4, 2023): 207. http://dx.doi.org/10.3390/bioengineering10020207.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We systematically evaluate the training methodology and efficacy of two inpainting-based pretext tasks of context prediction and context restoration for medical image segmentation using self-supervised learning (SSL). Multiple versions of self-supervised U-Net models were trained to segment MRI and CT datasets, each using a different combination of design choices and pretext tasks to determine the effect of these design choices on segmentation performance. The optimal design choices were used to train SSL models that were then compared with baseline supervised models for computing clinically-relevant metrics in label-limited scenarios. We observed that SSL pretraining with context restoration using 32 × 32 patches and Poission-disc sampling, transferring only the pretrained encoder weights, and fine-tuning immediately with an initial learning rate of 1 × 10−3 provided the most benefit over supervised learning for MRI and CT tissue segmentation accuracy (p< 0.001). For both datasets and most label-limited scenarios, scaling the size of unlabeled pretraining data resulted in improved segmentation performance. SSL models pretrained with this amount of data outperformed baseline supervised models in the computation of clinically-relevant metrics, especially when the performance of supervised learning was low. Our results demonstrate that SSL pretraining using inpainting-based pretext tasks can help increase the robustness of models in label-limited scenarios and reduce worst-case errors that occur with supervised learning.
19

Zeng, Jiaqi, and Pengtao Xie. "Contrastive Self-supervised Learning for Graph Classification." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10824–32. http://dx.doi.org/10.1609/aaai.v35i12.17293.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Graph classification is a widely studied problem and has broad applications. In many real-world problems, the number of labeled graphs available for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose two approaches based on contrastive self-supervised learning (CSSL) to alleviate overfitting. In the first approach, we use CSSL to pretrain graph encoders on widely-available unlabeled graphs without relying on human-provided labels, then finetune the pretrained encoders on labeled graphs. In the second approach, we develop a regularizer based on CSSL, and solve the supervised classification task and the unsupervised CSSL task simultaneously. To perform CSSL on graphs, given a collection of original graphs, we perform data augmentation to create augmented graphs out of the original graphs. An augmented graph is created by consecutively applying a sequence of graph alteration operations. A contrastive loss is defined to learn graph encoders by judging whether two augmented graphs are from the same original graph. Experiments on various graph classification datasets demonstrate the effectiveness of our proposed methods. The code is available at https://github.com/UCSD-AI4H/GraphSSL.
20

Wagner, Royden, Carlos Fernandez Lopez, and Christoph Stiller. "Self-supervised pseudo-colorizing of masked cells." PLOS ONE 18, no. 8 (August 24, 2023): e0290561. http://dx.doi.org/10.1371/journal.pone.0290561.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Self-supervised learning, which is strikingly referred to as the dark matter of intelligence, is gaining more attention in biomedical applications of deep learning. In this work, we introduce a novel self-supervision objective for the analysis of cells in biomedical microscopy images. We propose training deep learning models to pseudo-colorize masked cells. We use a physics-informed pseudo-spectral colormap that is well suited for colorizing cell topology. Our experiments reveal that approximating semantic segmentation by pseudo-colorization is beneficial for subsequent fine-tuning on cell detection. Inspired by the recent success of masked image modeling, we additionally mask out cell parts and train to reconstruct these parts to further enrich the learned representations. We compare our pre-training method with self-supervised frameworks including contrastive learning (SimCLR), masked autoencoders (MAEs), and edge-based self-supervision. We build upon our previous work and train hybrid models for cell detection, which contain both convolutional and vision transformer modules. Our pre-training method can outperform SimCLR, MAE-like masked image modeling, and edge-based self-supervision when pre-training on a diverse set of six fluorescence microscopy datasets. Code is available at: https://github.com/roydenwa/pseudo-colorize-masked-cells.
21

Liu, Yuanyuan, and Qianqian Liu. "Research on Self-Supervised Comparative Learning for Computer Vision." Journal of Electronic Research and Application 5, no. 3 (August 17, 2021): 5–17. http://dx.doi.org/10.26689/jera.v5i3.2320.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In recent years, self-supervised learning which does not require a large number of manual labels generate supervised signals through the data itself to attain the characterization learning of samples. Self-supervised learning solves the problem of learning semantic features from unlabeled data, and realizes pre-training of models in large data sets. Its significant advantages have been extensively studied by scholars in recent years. There are usually three types of self-supervised learning: “Generative, Contrastive, and Generative-Contrastive.” The model of the comparative learning method is relatively simple, and the performance of the current downstream task is comparable to that of the supervised learning method. Therefore, we propose a conceptual analysis framework: data augmentation pipeline, architectures, pretext tasks, comparison methods, semi-supervised fine-tuning. Based on this conceptual framework, we qualitatively analyze the existing comparative self-supervised learning methods for computer vision, and then further analyze its performance at different stages, and finally summarize the research status of self-supervised comparative learning methods in other fields.
22

Esser, Pascal, Maximilian Fleissner, and Debarghya Ghoshdastidar. "Non-parametric Representation Learning with Kernels." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 11 (March 24, 2024): 11910–18. http://dx.doi.org/10.1609/aaai.v38i11.29077.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Unsupervised and self-supervised representation learning has become popular in recent years for learning useful features from unlabelled data. Representation learning has been mostly developed in the neural network literature, and other models for representation learning are surprisingly unexplored. In this work, we introduce and analyze several kernel-based representation learning approaches: Firstly, we define two kernel Self-Supervised Learning (SSL) models using contrastive loss functions and secondly, a Kernel Autoencoder (AE) model based on the idea of embedding and reconstructing data. We argue that the classical representer theorems for supervised kernel machines are not always applicable for (self-supervised) representation learning, and present new representer theorems, which show that the representations learned by our kernel models can be expressed in terms of kernel matrices. We further derive generalisation error bounds for representation learning with kernel SSL and AE, and empirically evaluate the performance of these methods in both small data regimes as well as in comparison with neural network based models.
23

Polceanu, Mihai, Julie Porteous, Alan Lindsay, and Marc Cavazza. "Narrative Plan Generation with Self-Supervised Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 7 (May 18, 2021): 5984–92. http://dx.doi.org/10.1609/aaai.v35i7.16747.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Narrative Generation has attracted significant interest as a novel application of Automated Planning techniques. However, the vast amount of narrative material available opens the way to the use of Deep Learning techniques. In this paper, we explore the feasibility of narrative generation through self-supervised learning, using sequence embedding techniques or auto-encoders to produce narrative sequences. We use datasets of well-formed plots generated by a narrative planning approach, using pre-existing, published, narrative planning domains, to train generative models. Our experiments demonstrate the ability of generative sequence models to produce narrative plots with similar structure to those obtained with planning techniques, but with significant plot novelty in comparison with the training set. Most importantly, generated plots share structural properties associated with narrative quality measures used in Planning-based methods. As plan-based structures account for a higher level of causality and narrative consistency, this suggests that our approach is able to extend a set of narratives with novel sequences that display the same high-level narrative properties. Unlike methods developed to extend sets of textual narratives, ours operates at the level of plot structure. Thus, it has the potential to be used across various media for plots of significant complexity, being initially limited to training and generation operating in the same narrative genre.
24

Tóth, Martos, and Nelson Sommerfeldt. "PV self-consumption prediction methods using supervised machine learning." E3S Web of Conferences 362 (2022): 02003. http://dx.doi.org/10.1051/e3sconf/202236202003.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The increased prevalence of photovoltaic (PV) self-consumption policies across Europe and the world place an increased importance on accurate predictions for life-cycle costing during the planning phase. This study presents several machine learning and regression models for predicting self-consumption, trained on a variety of datasets from Sweden. The results show that advanced ML models have an improved performance over simpler regressions, where the highest performing model, Random Forest, has a mean average error of 1.5 percentage points and an R2 of 0.977. Training models using widely available typical meteorological year (TMY) climate data is also shown to introduce small, acceptable errors when tested against spatially and temporally matched climate and load data. The ability to train the ML models with TMY climate data makes their adoption easier and builds on previous work by demonstrating the robustness of the methodology as a self-consumption prediction tool. The low error and high R2 are a notable improvement over previous estimation models and the minimal input data requirements make them easy to adopt and apply in a wide array of applications.
25

Mustapha, Ahmad, Wael Khreich, and Wes Masri. "Inter-model interpretability: Self-supervised models as a case study." Array 22 (July 2024): 100350. http://dx.doi.org/10.1016/j.array.2024.100350.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
26

Shi, Haizhou, Youcai Zhang, Siliang Tang, Wenjie Zhu, Yaqian Li, Yandong Guo, and Yueting Zhuang. "On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2225–34. http://dx.doi.org/10.1609/aaai.v36i2.20120.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
It is a consensus that small models perform quite poorly under the paradigm of self-supervised contrastive learning. Existing methods usually adopt a large off-the-shelf model to transfer knowledge to the small one via distillation. Despite their effectiveness, distillation-based methods may not be suitable for some resource-restricted scenarios due to the huge computational expenses of deploying a large model. In this paper, we study the issue of training self-supervised small models without distillation signals. We first evaluate the representation spaces of the small models and make two non-negligible observations: (i) the small models can complete the pretext task without overfitting despite their limited capacity and (ii) they universally suffer the problem of over clustering. Then we verify multiple assumptions that are considered to alleviate the over-clustering phenomenon. Finally, we combine the validated techniques and improve the baseline performances of five small architectures with considerable margins, which indicates that training small self-supervised contrastive models is feasible even without distillation signals. The code is available at https://github.com/WOWNICE/ssl-small.
27

Makarov, Ilya, Maria Bakhanova, Sergey Nikolenko, and Olga Gerasimova. "Self-supervised recurrent depth estimation with attention mechanisms." PeerJ Computer Science 8 (January 31, 2022): e865. http://dx.doi.org/10.7717/peerj-cs.865.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Depth estimation has been an essential task for many computer vision applications, especially in autonomous driving, where safety is paramount. Depth can be estimated not only with traditional supervised learning but also via a self-supervised approach that relies on camera motion and does not require ground truth depth maps. Recently, major improvements have been introduced to make self-supervised depth prediction more precise. However, most existing approaches still focus on single-frame depth estimation, even in the self-supervised setting. Since most methods can operate with frame sequences, we believe that the quality of current models can be significantly improved with the help of information about previous frames. In this work, we study different ways of integrating recurrent blocks and attention mechanisms into a common self-supervised depth estimation pipeline. We propose a set of modifications that utilize temporal information from previous frames and provide new neural network architectures for monocular depth estimation in a self-supervised manner. Our experiments on the KITTI dataset show that proposed modifications can be an effective tool for exploiting temporal information in a depth prediction pipeline.
28

Hu, Fanghuai, Zhiqing Shao, and Tong Ruan. "Self-Supervised Chinese Ontology Learning from Online Encyclopedias." Scientific World Journal 2014 (2014): 1–13. http://dx.doi.org/10.1155/2014/848631.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.
29

Shwartz Ziv, Ravid, and Yann LeCun. "To Compress or Not to Compress—Self-Supervised Learning and Information Theory: A Review." Entropy 26, no. 3 (March 12, 2024): 252. http://dx.doi.org/10.3390/e26030252.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory has shaped deep neural networks, particularly the information bottleneck principle. This principle optimizes the trade-off between compression and preserving relevant information, providing a foundation for efficient network design in supervised contexts. However, its precise role and adaptation in self-supervised learning remain unclear. In this work, we scrutinize various self-supervised learning approaches from an information-theoretic perspective, introducing a unified framework that encapsulates the self-supervised information-theoretic learning problem. This framework includes multiple encoders and decoders, suggesting that all existing work on self-supervised learning can be seen as specific instances. We aim to unify these approaches to understand their underlying principles better and address the main challenge: many works present different frameworks with differing theories that may seem contradictory. By weaving existing research into a cohesive narrative, we delve into contemporary self-supervised methodologies, spotlight potential research areas, and highlight inherent challenges. Moreover, we discuss how to estimate information-theoretic quantities and their associated empirical problems. Overall, this paper provides a comprehensive review of the intersection of information theory, self-supervised learning, and deep neural networks, aiming for a better understanding through our proposed unified approach.
30

Montero Quispe, Kevin G., Daniel M. S. Utyiama, Eulanda M. dos Santos, Horácio A. B. F. Oliveira, and Eduardo J. P. Souto. "Applying Self-Supervised Representation Learning for Emotion Recognition Using Physiological Signals." Sensors 22, no. 23 (November 23, 2022): 9102. http://dx.doi.org/10.3390/s22239102.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The use of machine learning (ML) techniques in affective computing applications focuses on improving the user experience in emotion recognition. The collection of input data (e.g., physiological signals), together with expert annotations are part of the established standard supervised learning methodology used to train human emotion recognition models. However, these models generally require large amounts of labeled data, which is expensive and impractical in the healthcare context, in which data annotation requires even more expert knowledge. To address this problem, this paper explores the use of the self-supervised learning (SSL) paradigm in the development of emotion recognition methods. This approach makes it possible to learn representations directly from unlabeled signals and subsequently use them to classify affective states. This paper presents the key concepts of emotions and how SSL methods can be applied to recognize affective states. We experimentally analyze and compare self-supervised and fully supervised training of a convolutional neural network designed to recognize emotions. The experimental results using three emotion datasets demonstrate that self-supervised representations can learn widely useful features that improve data efficiency, are widely transferable, are competitive when compared to their fully supervised counterparts, and do not require the data to be labeled for learning.
31

Livieris, Ioannis, Andreas Kanavos, Vassilis Tampakas, and Panagiotis Pintelas. "An Auto-Adjustable Semi-Supervised Self-Training Algorithm." Algorithms 11, no. 9 (September 14, 2018): 139. http://dx.doi.org/10.3390/a11090139.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Semi-supervised learning algorithms have become a topic of significant research as an alternative to traditional classification methods which exhibit remarkable performance over labeled data but lack the ability to be applied on large amounts of unlabeled data. In this work, we propose a new semi-supervised learning algorithm that dynamically selects the most promising learner for a classification problem from a pool of classifiers based on a self-training philosophy. Our experimental results illustrate that the proposed algorithm outperforms its component semi-supervised learning algorithms in terms of accuracy, leading to more efficient, stable and robust predictive models.
32

Kahatapitiya, Kumara, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, and Gang Hua. "Weakly-Guided Self-Supervised Pretraining for Temporal Activity Detection." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 1078–86. http://dx.doi.org/10.1609/aaai.v37i1.25189.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Temporal Activity Detection aims to predict activity classes per frame, in contrast to video-level predictions in Activity Classification (i.e., Activity Recognition). Due to the expensive frame-level annotations required for detection, the scale of detection datasets is limited. Thus, commonly, previous work on temporal activity detection resorts to fine-tuning a classification model pretrained on large-scale classification datasets (e.g., Kinetics-400). However, such pretrained models are not ideal for downstream detection, due to the disparity between the pretraining and the downstream fine-tuning tasks. In this work, we propose a novel weakly-guided self-supervised pretraining method for detection. We leverage weak labels (classification) to introduce a self-supervised pretext task (detection) by generating frame-level pseudo labels, multi-action frames, and action segments. Simply put, we design a detection task similar to downstream, on large-scale classification data, without extra annotations. We show that the models pretrained with the proposed weakly-guided self-supervised detection task outperform prior work on multiple challenging activity detection benchmarks, including Charades and MultiTHUMOS. Our extensive ablations further provide insights on when and how to use the proposed models for activity detection. Code is available at github.com/kkahatapitiya/SSDet.
33

Cheng, Jiashun, Man Li, Jia Li, and Fugee Tsung. "Wiener Graph Deconvolutional Network Improves Graph Self-Supervised Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 7131–39. http://dx.doi.org/10.1609/aaai.v37i6.25870.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Graph self-supervised learning (SSL) has been vastly employed to learn representations from unlabeled graphs. Existing methods can be roughly divided into predictive learning and contrastive learning, where the latter one attracts more research attention with better empirical performance. We argue that, however, predictive models weaponed with powerful decoder could achieve comparable or even better representation power than contrastive models. In this work, we propose a Wiener Graph Deconvolutional Network (WGDN), an augmentation-adaptive decoder empowered by graph wiener filter to perform information reconstruction. Theoretical analysis proves the superior reconstruction ability of graph wiener filter. Extensive experimental results on various datasets demonstrate the effectiveness of our approach.
34

Fedden, Leon, Zhenning Zhang, Khan Baykaner, Qin Li, and Lucas Bordeaux. "Abstract 1937: DIME-CT: Self-supervised learning for medical image analysis using patch-based embeddings." Cancer Research 82, no. 12_Supplement (June 15, 2022): 1937. http://dx.doi.org/10.1158/1538-7445.am2022-1937.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract Whether tracking patient progress for clinical decision making, or investigating novel therapies, automated analysis of Computed Tomography (CT) imaging data is essential for the future of digital radiomics. In digital radiomics, as in all medical imaging, well annotated data is scarce, whereas unlabelled images are relatively plentiful. In other fields of image processing and medical imaging, the application of self-supervised learning (SSL) to large quantities of unlabelled data has resulted in great strides forward for fast, scalable, and interpretable image analysis. In this work we present a new approach applying SSL to CT imaging data which allows for: 1) Improved performance on image classification tasks, based on 2) dramatically reduced quantity of annotated CT imaging data, whilst also 3) enabling easy exploration and interpretation of the image regions. We applied a selection of self-supervised approaches (BYOL, DINO, SimCLR, & inpainting) to CT imaging data. Because the high dimensionality of CT data prevents us from using them directly to out-of the box SSL models, we adopt a 3D patching approach to reduce the dimensionality of the neural net input, and process each patch independently. We train our self-supervised models on public datasets (DeepLesion, NSCLC), and we specialize these models for tumor classification tasks that we evaluate on AstraZeneca sponsored clinical trials. The specialization is done in two different ways: 1) we use the pre-trained SSL model as an encoder that transforms the images of the clinical study into embeddings, on which we apply supervised classification models; 2) we use transfer learning to fine-tune supervised classification models that take the patches directly as inputs. We find that self-supervised pre-training significantly improves the accuracy on tumor classification tasks compared against a supervised learning baseline. Additionally, using the SSL embeddings we build an interactive map of CT imaging data enabling quick and intuitive inspection of the relevant regions. Our findings show that SSL constitutes an important tool for medical imaging analysis. SSL results in models that generalize better, and enable improved downstream interpretability and predictions. Furthermore, well trained SSL models can be re-applied to multiple indications because they are pre-trained on broad and diverse CT imaging data. Citation Format: Leon Fedden, Zhenning Zhang, Khan Baykaner, Qin Li, Lucas Bordeaux. DIME-CT: Self-supervised learning for medical image analysis using patch-based embeddings [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1937.
35

Xu, Xiangdong, Krzysztof Przystupa, and Orest Kochan. "Social Recommendation Algorithm Based on Self-Supervised Hypergraph Attention." Electronics 12, no. 4 (February 10, 2023): 906. http://dx.doi.org/10.3390/electronics12040906.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Social network information has been widely applied to traditional recommendations that have received significant attention in recent years. Most existing social recommendation models tend to use pairwise relationships to explore potential user preferences, but overlook the complexity of real-life interactions between users and the fact that user relationships may be higher order. These approaches also ignore the dynamic nature of friend influence, which leads the models to treat different friend influences equally in different ways. To address this, we propose a social recommendation algorithm that incorporates graph embedding and higher-order mutual information maximization based on the consideration of social consistency. Specifically, we use the graph attention model to build higher-order information among users for deeper mining of their behavioral patterns on the one hand; while on the other hand, it models user embedding based on the principle of social consistency to finally achieve finer-grained inference of user interests. In addition, to alleviate the problem of losing its own hierarchical information after fusing different levels of hypergraphs, we use self-supervised learning to construct auxiliary branches that fully enhance the rich information in the hypergraph. Experimental results conducted on two publicly available datasets show that the proposed model outperforms state-of-the-art methods.
36

Manessi, Franco, and Alessandro Rozza. "Graph-based neural network models with multiple self-supervised auxiliary tasks." Pattern Recognition Letters 148 (August 2021): 15–21. http://dx.doi.org/10.1016/j.patrec.2021.04.021.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
37

Zhang, Jian, Jianing Yang, Jun Yu, and Jianping Fan. "Semisupervised image classification by mutual learning of multiple self‐supervised models." International Journal of Intelligent Systems 37, no. 5 (January 14, 2022): 3117–41. http://dx.doi.org/10.1002/int.22814.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
38

Liu, Gang, Silu He, Xing Han, Qinyao Luo, Ronghua Du, Xinsha Fu, and Ling Zhao. "Self-Supervised Spatiotemporal Masking Strategy-Based Models for Traffic Flow Forecasting." Symmetry 15, no. 11 (October 31, 2023): 2002. http://dx.doi.org/10.3390/sym15112002.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Traffic flow forecasting is an important function of intelligent transportation systems. With the rise of deep learning, building traffic flow prediction models based on deep neural networks has become a current research hotspot. Most of the current traffic flow prediction methods are designed from the perspective of model architectures, using only the traffic features of future moments as supervision signals to guide the models to learn the spatiotemporal dependence in traffic flow. However, traffic flow data themselves contain rich spatiotemporal features, and it is feasible to obtain additional self-supervised signals from the data to assist the model to further explore the underlying spatiotemporal dependence. Therefore, we propose a self-supervised traffic flow prediction method based on a spatiotemporal masking strategy. A framework consisting of symmetric backbone models with asymmetric task heads were applied to learn both prediction and spatiotemporal context features. Specifically, a spatiotemporal context mask reconstruction task was designed to force the model to reconstruct the masked features via spatiotemporal context information, so as to assist the model to better understand the spatiotemporal contextual associations in the data. In order to avoid the model simply making inferences based on the local smoothness in the data without truly learning the spatiotemporal dependence, we performed a temporal shift operation on the features to be reconstructed. The experimental results showed that the model based on the spatiotemporal context masking strategy achieved an average prediction performance improvement of 1.56% and a maximum of 7.72% for longer prediction horizons of more than 30 min compared with the backbone models.
39

Joshi, Amey, Hrishitaa Kurchania, and Harikrishnan R. "Robust Object Segmentation using 3D Mesh Models and Self-Supervised Learning." Procedia Computer Science 235 (2024): 907–15. http://dx.doi.org/10.1016/j.procs.2024.04.086.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
40

Gao, Min, Yingmei Wei, Yuxiang Xie, and Yitong Zhang. "Traffic Prediction with Self-Supervised Learning: A Heterogeneity-Aware Model for Urban Traffic Flow Prediction Based on Self-Supervised Learning." Mathematics 12, no. 9 (April 24, 2024): 1290. http://dx.doi.org/10.3390/math12091290.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Accurate traffic prediction is pivotal when constructing intelligent cities to enhance urban mobility and to efficiently manage traffic flows. Traditional deep learning-based traffic prediction models primarily focus on capturing spatial and temporal dependencies, thus overlooking the existence of spatial and temporal heterogeneities. Heterogeneity is a crucial inherent characteristic of traffic data for the practical applications of traffic prediction. Spatial heterogeneities refer to the differences in traffic patterns across different regions, e.g., variations in traffic flow between office and commercial areas. Temporal heterogeneities refer to the changes in traffic patterns across different time steps, e.g., from morning to evening. Although existing models attempt to capture heterogeneities through predefined handcrafted features, multiple sets of parameters, and the fusion of spatial–temporal graphs, there are still some limitations. We propose a self-supervised learning-based traffic prediction framework called Traffic Prediction with Self-Supervised Learning (TPSSL) to address this issue. This framework leverages a spatial–temporal encoder for the prediction task and introduces adaptive data masking to enhance the robustness of the model against noise disturbances. Moreover, we introduce two auxiliary self-supervised learning paradigms to capture spatial heterogeneities and temporal heterogeneities, which also enrich the embeddings of the primary prediction task. We conduct experiments on four widely used traffic flow datasets, and the results demonstrate that TPSSL achieves state-of-the-art performance in traffic prediction tasks.
41

Xu, Yanjie, Hao Sun, Jin Chen, Lin Lei, Kefeng Ji, and Gangyao Kuang. "Adversarial Self-Supervised Learning for Robust SAR Target Recognition." Remote Sensing 13, no. 20 (October 17, 2021): 4158. http://dx.doi.org/10.3390/rs13204158.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Synthetic aperture radar (SAR) can perform observations at all times and has been widely used in the military field. Deep neural network (DNN)-based SAR target recognition models have achieved great success in recent years. Yet, the adversarial robustness of these models has received far less academic attention in the remote sensing community. In this article, we first present a comprehensive adversarial robustness evaluation framework for DNN-based SAR target recognition. Both data-oriented metrics and model-oriented metrics have been used to fully assess the recognition performance under adversarial scenarios. Adversarial training is currently one of the most successful methods to improve the adversarial robustness of DNN models. However, it requires class labels to generate adversarial attacks and suffers significant accuracy dropping on testing data. To address these problems, we introduced adversarial self-supervised learning into SAR target recognition for the first time and proposed a novel unsupervised adversarial contrastive learning-based defense method. Specifically, we utilize a contrastive learning framework to train a robust DNN with unlabeled data, which aims to maximize the similarity of representations between a random augmentation of a SAR image and its unsupervised adversarial example. Extensive experiments on two SAR image datasets demonstrate that defenses based on adversarial self-supervised learning can obtain comparable robust accuracy over state-of-the-art supervised adversarial learning methods.
42

Javed, Tahir, Kaushal Bhogale, Abhigyan Raman, Pratyush Kumar, Anoop Kunchukuttan, and Mitesh M. Khapra. "IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian Languages." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 12942–50. http://dx.doi.org/10.1609/aaai.v37i11.26521.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A cornerstone in AI research has been the creation and adoption of standardized training and test datasets to earmark the progress of state-of-the-art models. A particularly successful example is the GLUE dataset for training and evaluating Natural Language Understanding (NLU) models for English. The large body of research around self-supervised BERT-based language models revolved around performance improvements on NLU tasks in GLUE. To evaluate language models in other languages, several language-specific GLUE datasets were created. The area of speech language understanding (SLU) has followed a similar trajectory. The success of large self-supervised models such as wav2vec2 enable creation of speech models with relatively easy to access unlabelled data. These models can then be evaluated on SLU tasks, such as the SUPERB benchmark. In this work, we extend this to Indic languages by releasing the IndicSUPERB benchmark. Specifically, we make the following three contributions. (i) We collect Kathbath containing 1,684 hours of labelled speech data across 12 Indian languages from 1,218 contributors located in 203 districts in India. (ii) Using Kathbath, we create benchmarks across 6 speech tasks: Automatic Speech Recognition, Speaker Verification, Speaker Identification (mono/multi), Language Identification, Query By Example, and Keyword Spotting for 12 languages. (iii) On the released benchmarks, we train and evaluate different self-supervised models alongside the a commonly used baseline FBANK. We show that language-specific fine-tuned models are more accurate than baseline on most of the tasks, including a large gap of 76% for Language Identification task. However, for speaker identification, self-supervised models trained on large datasets demonstrate an advantage. We hope IndicSUPERB contributes to the progress of developing speech language understanding models for Indian languages.
43

Zhao, Nanxuan, Zhirong Wu, Rynson W. H. Lau, and Stephen Lin. "Distilling Localization for Self-Supervised Representation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10990–98. http://dx.doi.org/10.1609/aaai.v35i12.17312.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Recent progress in contrastive learning has revolutionized unsupervised representation learning. Concretely, multiple views (augmentations) from the same image are encouraged to map to close embeddings, while views from different images are pulled apart.In this paper, through visualizing and diagnosing classification errors, we observe that current contrastive models are ineffective at localizing the foreground object, limiting their ability to extract discriminative high-level features. This is due to the fact that view generation process considers pixels in an image uniformly.To address this problem, we propose a data-driven approach for learning invariance to backgrounds. It first estimates foreground saliency in images and then creates augmentations by copy-and-pasting the foreground onto a variety of back-grounds. The learning still follows an instance discrimination approach, so that the representation is trained to disregard background content and focus on the foreground. We study a variety of saliency estimation methods, and find that most methods lead to improvements for contrastive learning. With this approach, significant performance is achieved for self-supervised learning on ImageNet classification, and also for object detection on PASCAL VOC and MSCOCO.
44

Guo, Yuzhi, Jiaxiang Wu, Hehuan Ma, and Junzhou Huang. "Self-Supervised Pre-training for Protein Embeddings Using Tertiary Structures." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (June 28, 2022): 6801–9. http://dx.doi.org/10.1609/aaai.v36i6.20636.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The protein tertiary structure largely determines its interaction with other molecules. Despite its importance in various structure-related tasks, fully-supervised data are often time-consuming and costly to obtain. Existing pre-training models mostly focus on amino-acid sequences or multiple sequence alignments, while the structural information is not yet exploited. In this paper, we propose a self-supervised pre-training model for learning structure embeddings from protein tertiary structures. Native protein structures are perturbed with random noise, and the pre-training model aims at estimating gradients over perturbed 3D structures. Specifically, we adopt SE(3)-invariant features as model inputs and reconstruct gradients over 3D coordinates with SE(3)-equivariance preserved. Such paradigm avoids the usage of sophisticated SE(3)-equivariant models, and dramatically improves the computational efficiency of pre-training models. We demonstrate the effectiveness of our pre-training model on two downstream tasks, protein structure quality assessment (QA) and protein-protein interaction (PPI) site prediction. Hierarchical structure embeddings are extracted to enhance corresponding prediction models. Extensive experiments indicate that such structure embeddings consistently improve the prediction accuracy for both downstream tasks.
45

Lin, Ken, Xiongwen Quan, Wenya Yin, and Han Zhang. "A Contrastive Learning Pre-Training Method for Motif Occupancy Identification." International Journal of Molecular Sciences 23, no. 9 (April 24, 2022): 4699. http://dx.doi.org/10.3390/ijms23094699.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability within their sequence representations. In this work, we propose a contrastive learning method to pre-train interpretable and robust DNA encoding for motif occupancy identification. We construct two alternative models to pre-train DNA sequential encoder, respectively: a self-supervised model and a supervised model. We augment the original sequences for contrastive learning with edit operations defined in edit distance. Specifically, we propose a sequence similarity criterion based on the Needleman–Wunsch algorithm to discriminate positive and negative sample pairs in self-supervised learning. Finally, a DNN classifier is fine-tuned along with the pre-trained encoder to predict the results of motif occupancy identification. Both proposed contrastive learning models outperform the baseline end-to-end CNN model and SimCLR method, reaching AUC of 0.811 and 0.823, respectively. Compared with the baseline method, our models show better robustness for small samples. Specifically, the self-supervised model is proved to be practicable in transfer learning.
46

Nimitsurachat, Peranut, and Peter Washington. "Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space." AI 5, no. 1 (January 17, 2024): 195–207. http://dx.doi.org/10.3390/ai5010011.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels. Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU- MOSEI)’s acoustic data. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data with 74 parameters of distinctive audio features at discrete timesteps. Our model is first pre-trained to uncover the randomly masked timestamps of the acoustic data. The pre-trained model is then fine-tuned using a small sample of annotated data. The performance of the final model is then evaluated via overall mean absolute error (MAE), mean absolute error (MAE) per emotion, overall four-class accuracy, and four-class accuracy per emotion. These metrics are compared against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics, especially when the number of annotated data points in the fine-tuning step is small. Furthermore, we quantify the behaviors of the self-supervised model and its convergence as the amount of annotated data increases. This work characterizes the utility of self-supervised learning for affective computing, demonstrating that self-supervised learning is most useful when the number of training examples is small and that the effect is most pronounced for emotions which are easier to classify such as happy, sad, and angry. This work further demonstrates that self-supervised learning still improves performance when applied to the embedded feature representations rather than the traditional approach of pre-training on the raw input space.
47

Parmar, Chaitanya, Albert Juan Ramon, Nicole L. Stone, Spyros Triantos, Joel Greshock, and Kristopher Standish. "Generalizable FGFR prediction across tumor types using self supervised learning." Journal of Clinical Oncology 41, no. 16_suppl (June 1, 2023): e15057-e15057. http://dx.doi.org/10.1200/jco.2023.41.16_suppl.e15057.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
e15057 Background: Deep Learning models have demonstrated the ability to detect tumors, classify disease state, or infer genetic biomarkers from digital whole slide images (WSI) of Hematoxylin and Eosin (H&E)-stained tissue. Such models can be deployed to improve clinical practice or efficiently recruit for clinical trials. However, the data to train these models is often very limited and image capture protocols vary between labs. We propose to use Self Supervised Learning (SSL) on large unlabeled histopathology data sets to improve generalizability and predictive performance of deep learning models on unseen data. Methods: We pre-trained a Convolutional Neural Network (CNN) using SSL on 25,120 unlabeled, digital WSIs from various sources (multiple scanners, hospital systems, labs, diseases, tissue sites). We then fine-tuned the pre-trained CNN to infer the presence or absence of select FGFR alterations in Muscle Invasive Bladder Cancer (MIBC) from the digital WSI. We applied this model (FGFRSSL) to WSIs of biopsies from other data sources and tumor types and compared with a model (FGFRTRAD) that was trained exclusively on MIBC samples without self-supervised learning. Results: The FGFRSSL model achieved an Area Under ROC Curve (AUC = 0.80) on MIBC WSIs and maintained performance on images from an independent lab (AUC=0.82). We further show that the model generalizes to Non-Muscle Invasive Bladder Cancer (NMIBC) samples with (AUC = 0.76) and Pan-Tumor WSIs (AUC: 0.83). The FGFRTRAD model trained without SSL-based pre-training achieved an Area Under ROC Curve (AUC: 0.76) on MIBC WSIs was less generalizable, showing lower performance for independent data (AUC = 0.72), NMIBC samples (AUC = 0.72), and Pan-Tumor samples (AUC = 0.64). Conclusions: We leveraged SSL pretraining to improve the reliability and generalizability of AI-based models across multiple data sources. We also demonstrate our model’s ability to infer FGFR status across multiple solid tumor types after training on only MIBC samples, suggesting the cell morphology conferred by FGFR alteration in cancer may be shared across diverse tumor types. These models represent a means for efficiently screening patients for actionable clinical biomarkers in a robust manner to guide clinical decisions and inform drug development efforts. [Table: see text]
48

Díaz, Gabriel, Billy Peralta, Luis Caro, and Orietta Nicolis. "Co-Training for Visual Object Recognition Based on Self-Supervised Models Using a Cross-Entropy Regularization." Entropy 23, no. 4 (April 1, 2021): 423. http://dx.doi.org/10.3390/e23040423.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Automatic recognition of visual objects using a deep learning approach has been successfully applied to multiple areas. However, deep learning techniques require a large amount of labeled data, which is usually expensive to obtain. An alternative is to use semi-supervised models, such as co-training, where multiple complementary views are combined using a small amount of labeled data. A simple way to associate views to visual objects is through the application of a degree of rotation or a type of filter. In this work, we propose a co-training model for visual object recognition using deep neural networks by adding layers of self-supervised neural networks as intermediate inputs to the views, where the views are diversified through the cross-entropy regularization of their outputs. Since the model merges the concepts of co-training and self-supervised learning by considering the differentiation of outputs, we called it Differential Self-Supervised Co-Training (DSSCo-Training). This paper presents some experiments using the DSSCo-Training model to well-known image datasets such as MNIST, CIFAR-100, and SVHN. The results indicate that the proposed model is competitive with the state-of-art models and shows an average relative improvement of 5% in accuracy for several datasets, despite its greater simplicity with respect to more recent approaches.
49

Choudhary, Nurendra, Charu C. Aggarwal, Karthik Subbian, and Chandan K. Reddy. "Self-supervised Short-text Modeling through Auxiliary Context Generation." ACM Transactions on Intelligent Systems and Technology 13, no. 3 (June 30, 2022): 1–21. http://dx.doi.org/10.1145/3511712.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Short text is ambiguous and often relies predominantly on the domain and context at hand in order to attain semantic relevance. Existing classification models perform poorly on short text due to data sparsity and inadequate context. Auxiliary context, which can often provide sufficient background regarding the domain, is typically available in several application scenarios. While some of the existing works aim to leverage real-world knowledge to enhance short-text representations, they fail to place appropriate emphasis on the auxiliary context. Such models do not harness the full potential of the available context in auxiliary sources. To address this challenge, we reformulate short-text classification as a dual channel self-supervised learning problem (that leverages auxiliary context) with a generation network and a corresponding prediction model. We propose a self-supervised framework, Pseudo-Auxiliary Context generation network for Short-text Modeling (PACS) , to comprehensively leverage auxiliary context and it is jointly learned with a prediction network in an end-to-end manner. Our PACS model consists of two sub-networks: a Context Generation Network (CGN) that models the auxiliary context’s distribution and a Prediction Network (PN) to map the short-text features and auxiliary context distribution to the final class label. Our experimental results on diverse datasets demonstrate that PACS outperforms formidable state-of-the-art baselines. We also demonstrate the performance of our model on cold-start scenarios (where contextual information is non-existent) during prediction. Furthermore, we perform interpretability and ablation studies to analyze various representational features captured by our model and the individual contribution of its modules to the overall performance of PACS, respectively.
50

Zhang, Ming, Xin Gu, Ji Qi, Zhenshi Zhang, Hemeng Yang, Jun Xu, Chengli Peng, and Haifeng Li. "CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation." Remote Sensing 16, no. 7 (April 6, 2024): 1293. http://dx.doi.org/10.3390/rs16071293.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The self-supervised learning (SSL) technique, driven by massive unlabeled data, is expected to be a promising solution for semantic segmentation of remote sensing images (RSIs) with limited labeled data, revolutionizing transfer learning. Traditional ‘local-to-local’ transfer from small, local datasets to another target dataset plays an ever-shrinking role due to RSIs’ diverse distribution shifts. Instead, SSL promotes a ‘global-to-local’ transfer paradigm, in which generalized models pre-trained on arbitrarily large unlabeled datasets are fine-tuned to the target dataset to overcome data distribution shifts. However, the SSL pre-trained models may contain both useful and useless features for the downstream semantic segmentation task, due to the gap between the SSL tasks and the downstream task. To adapt such pre-trained models to semantic segmentation tasks, traditional supervised fine-tuning methods that use only a small number of labeled samples may drop out useful features due to overfitting. The main reason behind this is that supervised fine-tuning aims to map a few training samples from the high-dimensional, sparse image space to the low-dimensional, compact semantic space defined by the downstream labels, resulting in a degradation of the distinguishability. To address the above issues, we propose a class distinguishability-enhanced self-training (CDEST) method to support global-to-local transfer. First, the self-training module in CDEST introduces a semi-supervised learning mechanism to fully utilize the large amount of unlabeled data in the downstream task to increase the size and diversity of the training data, thus alleviating the problem of biased overfitting of the model. Second, the supervised and semi-supervised contrastive learning modules of CDEST can explicitly enhance the class distinguishability of features, helping to preserve the useful features learned from pre-training while adapting to downstream tasks. We evaluate the proposed CDEST method on four RSI semantic segmentation datasets, and our method achieves optimal experimental results on all four datasets compared to supervised fine-tuning as well as three semi-supervised fine-tuning methods.

До бібліографії