Academic literature on the topic 'Self-Supervised models'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Self-Supervised models.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Self-Supervised models":

1

Anton, Jonah, Liam Castelli, Mun Fai Chan, Mathilde Outters, Wan Hee Tang, Venus Cheung, Pancham Shukla, Rahee Walambe, and Ketan Kotecha. "How Well Do Self-Supervised Models Transfer to Medical Imaging?" Journal of Imaging 8, no. 12 (December 1, 2022): 320. http://dx.doi.org/10.3390/jimaging8120320.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Self-supervised learning approaches have seen success transferring between similar medical imaging datasets, however there has been no large scale attempt to compare the transferability of self-supervised models against each other on medical images. In this study, we compare the generalisability of seven self-supervised models, two of which were trained in-domain, against supervised baselines across nine different medical datasets. We find that ImageNet pretrained self-supervised models are more generalisable than their supervised counterparts, scoring up to 10% better on medical classification tasks. The two in-domain pretrained models outperformed other models by over 20% on in-domain tasks, however they suffered significant loss of accuracy on all other tasks. Our investigation of the feature representations suggests that this trend may be due to the models learning to focus too heavily on specific areas.
2

Gatopoulos, Ioannis, and Jakub M. Tomczak. "Self-Supervised Variational Auto-Encoders." Entropy 23, no. 6 (June 14, 2021): 747. http://dx.doi.org/10.3390/e23060747.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Density estimation, compression, and data generation are crucial tasks in artificial intelligence. Variational Auto-Encoders (VAEs) constitute a single framework to achieve these goals. Here, we present a novel class of generative models, called self-supervised Variational Auto-Encoder (selfVAE), which utilizes deterministic and discrete transformations of data. This class of models allows both conditional and unconditional sampling while simplifying the objective function. First, we use a single self-supervised transformation as a latent variable, where the transformation is either downscaling or edge detection. Next, we consider a hierarchical architecture, i.e., multiple transformations, and we show its benefits compared to the VAE. The flexibility of selfVAE in data reconstruction finds a particularly interesting use case in data compression tasks, where we can trade-off memory for better data quality and vice-versa. We present the performance of our approach on three benchmark image data (Cifar10, Imagenette64, and CelebA).
3

Zhang, Ronghua, Yuanyuan Wang, Fangyuan Liu, Changzheng Liu, Yaping Song, and Baohua Yu. "S2NMF: Information Self-Enhancement Self-Supervised Nonnegative Matrix Factorization for Recommendation." Wireless Communications and Mobile Computing 2022 (August 30, 2022): 1–10. http://dx.doi.org/10.1155/2022/4748858.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Nonnegative matrix factorization (NMF), which is aimed at making all elements of the factorization nonnegative and achieving nonlinear dimensional reduction at the same time, is an effective method for solving recommendation system problems. However, in many real-world applications, most models learn recommendation models under the supervised learning paradigm. Since the recommendation performance of NMF models relies heavily on initialization, the user-item interaction information is often very sparse. In many cases, supervised information about the data is difficult to obtain, resulting in a large number of existing models for supervised learning being inapplicable. To address this problem, we propose an information self-supervised NMF model for recommendation. Specifically, this model is based on the matrix factorization idea and introduces a self-supervised learning mechanism based on the NMF model to enhance the sparse data information of sparse data, and an easily extensible self-supervised NMF model was proposed. Furthermore, a corresponding gradient descent optimization algorithm was proposed, and the complexity of the algorithm was analysed. A large number of experimental results show that the proposed S2NMF has better performance.
4

Dang, Thanh-Vu, JinYoung Kim, Gwang-Hyun Yu, Ji Yong Kim, Young Hwan Park, and ChilWoo Lee. "Korean Text to Gloss: Self-Supervised Learning approach." Korean Institute of Smart Media 12, no. 1 (February 28, 2023): 32–46. http://dx.doi.org/10.30693/smj.2023.12.1.32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Natural Language Processing (NLP) has grown tremendously in recent years. Typically, bilingual, and multilingual translation models have been deployed widely in machine translation and gained vast attention from the research community. On the contrary, few studies have focused on translating between spoken and sign languages, especially non-English languages. Prior works on Sign Language Translation (SLT) have shown that a mid-level sign gloss representation enhances translation performance. Therefore, this study presents a new large-scale Korean sign language dataset, the Museum-Commentary Korean Sign Gloss (MCKSG) dataset, including 3828 pairs of Korean sentences and their corresponding sign glosses used in Museum-Commentary contexts. In addition, we propose a translation framework based on self-supervised learning, where the pretext task is a text-to-text from a Korean sentence to its back-translation versions, then the pre-trained network will be fine-tuned on the MCKSG dataset. Using self-supervised learning help to overcome the drawback of a shortage of sign language data. Through experimental results, our proposed model outperforms a baseline BERT model by 6.22%.
5

Risojević, V., and V. Stojnić. "DO WE STILL NEED IMAGENET PRE-TRAINING IN REMOTE SENSING SCENE CLASSIFICATION?" International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B3-2022 (May 31, 2022): 1399–406. http://dx.doi.org/10.5194/isprs-archives-xliii-b3-2022-1399-2022.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract. Due to the scarcity of labeled data, using supervised models pre-trained on ImageNet is a de facto standard in remote sensing scene classification. Recently, the availability of larger high resolution remote sensing (HRRS) image datasets and progress in self-supervised learning have brought up the questions of whether supervised ImageNet pre-training is still necessary for remote sensing scene classification and would supervised pre-training on HRRS image datasets or self-supervised pre-training on ImageNet achieve better results on target remote sensing scene classification tasks. To answer these questions, in this paper we both train models from scratch and fine-tune supervised and self-supervised ImageNet models on several HRRS image datasets. We also evaluate the transferability of learned representations to HRRS scene classification tasks and show that self-supervised pre-training outperforms the supervised one, while the performance of HRRS pre-training is similar to self-supervised pre-training or slightly lower. Finally, we propose using an ImageNet pre-trained model combined with a second round of pre-training using in-domain HRRS images, i.e. domain-adaptive pre-training. The experimental results show that domain-adaptive pre-training results in models that achieve state-of-the-art results on HRRS scene classification benchmarks. The source code and pre-trained models are available at https://github.com/risojevicv/RSSC-transfer.
6

Imran, Abdullah-Al-Zubaer, Chao Huang, Hui Tang, Wei Fan, Yuan Xiao, Dingjun Hao, Zhen Qian, and Demetri Terzopoulos. "Self-Supervised, Semi-Supervised, Multi-Context Learning for the Combined Classification and Segmentation of Medical Images (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (April 3, 2020): 13815–16. http://dx.doi.org/10.1609/aaai.v34i10.7179.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
To tackle the problem of limited annotated data, semi-supervised learning is attracting attention as an alternative to fully supervised models. Moreover, optimizing a multiple-task model to learn “multiple contexts” can provide better generalizability compared to single-task models. We propose a novel semi-supervised multiple-task model leveraging self-supervision and adversarial training—namely, self-supervised, semi-supervised, multi-context learning (S4MCL)—and apply it to two crucial medical imaging tasks, classification and segmentation. Our experiments on spine X-rays reveal that the S4MCL model significantly outperforms semi-supervised single-task, semi-supervised multi-context, and fully-supervised single-task models, even with a 50% reduction of classification and segmentation labels.
7

Zhou, Meng, Zechen Li, and Pengtao Xie. "Self-supervised Regularization for Text Classification." Transactions of the Association for Computational Linguistics 9 (2021): 641–56. http://dx.doi.org/10.1162/tacl_a_00389.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) is an unsupervised learning approach that defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any human- provided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/UCSD-AI4H/SSReg.
8

Gong, Yuan, Cheng-I. Lai, Yu-An Chung, and James Glass. "SSAST: Self-Supervised Audio Spectrogram Transformer." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10699–709. http://dx.doi.org/10.1609/aaai.v36i10.21315.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Recently, neural networks based purely on self-attention, such as the Vision Transformer (ViT), have been shown to outperform deep learning models constructed with convolutional neural networks (CNNs) on various vision tasks, thus extending the success of Transformers, which were originally developed for language processing, to the vision domain. A recent study showed that a similar methodology can also be applied to the audio domain. Specifically, the Audio Spectrogram Transformer (AST) achieves state-of-the-art results on various audio classification benchmarks. However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST. This paper focuses on audio and speech classification, and aims to reduce the need for large amounts of labeled data for the AST by leveraging self-supervised learning using unlabeled data. Specifically, we propose to pretrain the AST model with joint discriminative and generative masked spectrogram patch modeling (MSPM) using unlabeled audio from AudioSet and Librispeech. We evaluate our pretrained models on both audio and speech classification tasks including audio event classification, keyword spotting, emotion recognition, and speaker identification. The proposed self-supervised framework significantly boosts AST performance on all tasks, with an average improvement of 60.9%, leading to similar or even better results than a supervised pretrained AST. To the best of our knowledge, it is the first patch-based self-supervised learning framework in the audio and speech domain, and also the first self-supervised learning framework for AST.
9

Chen, Xuehao, Jin Zhou, Yuehui Chen, Shiyuan Han, Yingxu Wang, Tao Du, Cheng Yang, and Bowen Liu. "Self-Supervised Clustering Models Based on BYOL Network Structure." Electronics 12, no. 23 (November 21, 2023): 4723. http://dx.doi.org/10.3390/electronics12234723.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Contrastive-based clustering models usually rely on a large number of negative pairs to capture uniform representations, which requires a large batch size and high computational complexity. In contrast, some self-supervised methods perform non-contrastive learning to capture discriminative representations only with positive pairs, but suffer from the collapse of clustering. To solve these issues, a novel end-to-end self-supervised clustering model is proposed in this paper. The basic self-supervised learning network is first modified, followed by the incorporation of a Softmax layer to obtain cluster assignments as data representation. Then, adversarial learning on the cluster assignments is integrated into the methods to further enhance discrimination across different clusters and mitigate the collapse between clusters. To further encourage clustering-oriented guidance, a new cluster-level discrimination is assembled to promote clustering performance by measuring the self-correlation between the learned cluster assignments. Experimental results on real-world datasets exhibit better performance of the proposed model compared with the existing deep clustering methods.
10

Luo, Dezhao, Chang Liu, Yu Zhou, Dongbao Yang, Can Ma, Qixiang Ye, and Weiping Wang. "Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11701–8. http://dx.doi.org/10.1609/aaai.v34i07.6840.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates “blanks” by withholding video clips and then creates “options” by applying spatio-temporal operations on the withheld clips. Finally, it fills the blanks with “options” and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-of-the-art self-supervised models with significant margins.

Dissertations / Theses on the topic "Self-Supervised models":

1

Rossi, Alex. "Self-supervised information retrieval: a novel approach based on Deep Metric Learning and Neural Language Models." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.
2

Douzon, Thibault. "Language models for document understanding." Electronic Thesis or Diss., Lyon, INSA, 2023. http://www.theses.fr/2023ISAL0075.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Chaque jour, les entreprises du monde entier reçoivent et traitent d'énormes volumes de documents, entraînant des coûts considérables. Pour réduire ces coûts, de grandes entreprises automatisent le traitement documentaire, visant une automatisation complète. Cette thèse se concentre sur l'utilisation de modèles d'apprentissage machine pour extraire des informations de documents. Les progrès récents en matière d'architecture de modèle, en particulier les transformeurs, ont révolutionné le domaine grâce à leur utilisation généralisée de l'attention et à l'amélioration des pré-entraînements auto-supervisés. Nous montrons que les transformeurs, pré-entraînés sur des documents, effectuent des tâches de compréhension de documents avec précision et surpassent les modèles à base de réseaux récurrents pour l'extraction d'informations par classification de mots. Les transformeurs nécessitent également moins de données d'entraînement pour atteindre des performances élevées, soulignant l'importance du pré-entraînement auto-supervisé. Dans la suite, nous introduisons des tâches de pré-entraînement spécifiquement adaptées aux documents d'entreprise, améliorant les performances même avec des modèles plus petits. Cela permet d'atteindre des niveaux de performance similaires à ceux de modèles plus gros, ouvrant la voie à des modèles plus petits et plus économiques. Enfin, nous abordons le défi du coût d'évaluation des transformeurs sur de longues séquences. Nous montrons que des architectures plus efficaces dérivées des transformeurs nécessitent moins de ressources et donnent de meilleurs résultats sur de longues séquences. Cependant, elles peuvent perdre légèrement en performance sur de courtes séquences par rapport aux transformeurs classiques. Cela suggère l'avantage d'utiliser plusieurs modèles en fonction de la longueur des séquences à traiter, ouvrant la possibilité de concaténer des séquences de différentes modalités
Every day, an uncountable amount of documents are received and processed by companies worldwide. In an effort to reduce the cost of processing each document, the largest companies have resorted to document automation technologies. In an ideal world, a document can be automatically processed without any human intervention: its content is read, and information is extracted and forwarded to the relevant service. The state-of-the-art techniques have quickly evolved in the last decades, from rule-based algorithms to statistical models. This thesis focuses on machine learning models for document information extraction. Recent advances in model architecture for natural language processing have shown the importance of the attention mechanism. Transformers have revolutionized the field by generalizing the use of attention and by pushing self-supervised pre-training to the next level. In the first part, we confirm that transformers with appropriate pre-training were able to perform document understanding tasks with high performance. We show that, when used as a token classifier for information extraction, transformers are able to exceptionally efficiently learn the task compared to recurrent networks. Transformers only need a small proportion of the training data to reach close to maximum performance. This highlights the importance of self-supervised pre-training for future fine-tuning. In the following part, we design specialized pre-training tasks, to better prepare the model for specific data distributions such as business documents. By acknowledging the specificities of business documents such as their table structure and their over-representation of numeric figures, we are able to target specific skills useful for the model in its future tasks. We show that those new tasks improve the model's downstream performances, even with small models. Using this pre-training approach, we are able to reach the performances of significantly bigger models without any additional cost during finetuning or inference. Finally, in the last part, we address one drawback of the transformer architecture which is its computational cost when used on long sequences. We show that efficient architectures derived from the classic transformer require fewer resources and perform better on long sequences. However, due to how they approximate the attention computation, efficient models suffer from a small but significant performance drop on short sequences compared to classical architectures. This incentivizes the use of different models depending on the input length and enables concatenating multimodal inputs into a single sequence
3

Lin, Lyu. "Transformer-based Model for Molecular Property Prediction with Self-Supervised Transfer Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-284682.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Molecular property prediction has a vast range of applications in the chemical industry. A powerful molecular property prediction model can promote experiments and production processes. The idea behind this degree program lies in the use of transfer learning to predict molecular properties. The project is divided into two parts. The first part is to build and pre-train the model. The model, which is constructed with pure attention-based Transformer Layer, is pre-trained through a Masked Edge Recovery task with large-scale unlabeled data. Then, the performance of this pre- trained model is tested with different molecular property prediction tasks and finally verifies the effectiveness of transfer learning.The results show that after self-supervised pre-training, this model shows its excellent generalization capability. It is possible to be fine-tuned with a short period and performs well in downstream tasks. And the effectiveness of transfer learning is reflected in the experiment as well. The pre-trained model not only shortens the task- specific training time but also obtains better performance and avoids overfitting due to too little training data for molecular property prediction.
Prediktion av molekylers egenskaper har en stor mängd tillämpningar inom kemiindustrin. Kraftfulla metoder för att predicera molekylära egenskaper kan främja vetenskapliga experiment och produktionsprocesser. Ansatsen i detta arbete är att använda överförd inlärning (eng. transfer learning) för att predicera egenskaper hos molekyler. Projektet är indelat i två delar. Den första delen fokuserar på att utveckla och förträna en modell. Modellen består av Transformer-lager med attention- mekanismer och förtränas genom att återställa maskerade kanter i molekylgrafer från storskaliga mängder icke-annoterad data. Efteråt utvärderas prestandan hos den förtränade modellen i en mängd olika uppgifter baserade på prediktion av molekylegenskaper vilket bekräftar fördelen med överförd inlärning.Resultaten visar att modellen efter självövervakad förträning besitter utmärkt förmåga till att generalisera. Den kan finjusteras med liten tidskostnad och presterar väl i specialiserade uppgifter. Effektiviteten hos överförd inlärning visas också i experimenten. Den förtränade modellen förkortar inte bara tiden för uppgifts-specifik inlärning utan uppnår även bättre prestanda och undviker att övertränas på grund otillräckliga mängder data i uppgifter för prediktion av molekylegenskaper.
4

Ter-Hovhannisyan, Vardges. "Unsupervised and semi-supervised training methods for eukaryotic gene prediction." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (Ph.D)--Biology, Georgia Institute of Technology, 2009.
Committee Chair: Mark Borodovky; Committee Member: Jung H. Choi; Committee Member: King Jordan; Committee Member: Leonid Bunimovich; Committee Member: Yury Chernoff. Part of the SMARTech Electronic Thesis and Dissertation Collection.
5

Pelloin, Valentin. "La compréhension de la parole dans les systèmes de dialogues humain-machine à l'heure des modèles pré-entraînés." Electronic Thesis or Diss., Le Mans, 2024. http://www.theses.fr/2024LEMA1002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Dans cette thèse, la compréhension automatique de la parole (SLU) est étudiée dans le cadre applicatif de dialogues téléphoniques à buts définis (réservation de chambres d'hôtel par exemple). Historiquement, la SLU était réalisée en cascade : un système de reconnaissance de la parole réalisait une transcription en mots, puis un système de compréhension y associait une annotation sémantique. Le développement des méthodes neuronales profondes a fait émerger les architectures de bout-en-bout, où la tâche de compréhension est réalisée par un système unique, appliqué directement à partir du signal de parole pour en extraire l’annotation sémantique. Récemment, les modèles dits pré-entraînés de manière non supervisée (SSL) ont apporté de nouvelles avancées en traitement automatique des langues (TAL). Appris de façon générique sur de très grandes masses de données, ils peuvent ensuite être adaptés pour d'autres applications. À ce jour, les meilleurs résultats SLU sont obtenus avec des systèmes en cascade intégrant des modèles SSL.Cependant, aucune des architectures, cascade ou bout-en-bout, n'est parfaite. À travers cette thèse, nous étudions ces architectures et proposons des versions hybrides qui tentent de tirer parti des avantages de chacune. Après avoir développé un modèle SLU bout-en-bout à l’état de l’art, nous avons évalué différentes stratégies d’hybridation. Les avancées apportées par les modèles SSL en cours de thèse, nous ont amenés à les intégrer dans notre architecture hybride
In this thesis, spoken language understanding (SLU) is studied in the application context of telephone dialogues with defined goals (hotel booking reservations, for example). Historically, SLU was performed through a cascade of systems: a first system would transcribe the speech into words, and a natural language understanding system would link those words to a semantic annotation. The development of deep neural methods has led to the emergence of end-to-end architectures, where the understanding task is performed by a single system, applied directly to the speech signal to extract the semantic annotation. Recently, so-called self-supervised learning (SSL) pre-trained models have brought new advances in natural language processing (NLP). Learned in a generic way on very large datasets, they can then be adapted for other applications. To date, the best SLU results have been obtained with pipeline systems incorporating SSL models.However, none of the architectures, pipeline or end-to-end, is perfect. In this thesis, we study these architectures and propose hybrid versions that attempt to benefit from the advantages of each. After developing a state-of-the-art end-to-end SLU model, we evaluated different hybrid strategies. The advances made by SSL models during the course of this thesis led us to integrate them into our hybrid architecture
6

Cavallucci, Martina. "Speech Recognition per l'italiano: Sviluppo e Sperimentazione di Soluzioni Neurali con Language Model." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Le e-mail e i servizi di messaggistica hanno cambiato significativamente la comunicazione umana, ma la parola è ancora il metodo più importante di comunicazione tra esseri umani. Pertanto, il riconoscimento vocale automatico (ASR) è di particolare rilevanza perché fornisce una trascrizione della lingua parlata che può essere valutata da sistemi automatizzati. Con altoparlanti intelligenti come Google Home, Alexa o Siri, l' ASR è già un parte integrante di molte famiglie ed è usato per suonare musica, rispondere alle domande o controllare altri dispositivi intelligenti come un sistema di domotica. Tuttavia, l' ASR può essere trovato anche in molti altri sistemi, come sistemi di dettatura, traduttori vocali o interfacce utente vocali. Sempre più aziende ne comprendono le potenzialità sopratutto per migliorare i processi aziendali, il lavoro di tesi mira infatti a sperimentare modelli neurali per la trascrizione di Webinar creati dall'azienda ospitante Maggioli dove si è svolto il tirocinio, ottenendo così trascrizioni utili per il recupero delle informazioni e la loro gestione. A tale scopo si sono utilizzati modelli basati sui recenti Transformers e grazie alla tecnica dell'apprendimento auto-supervisionato che apprende da dati non etichettati è stato possibile ottenere buoni risultati su dataset con audio e trascrizioni in italiano di cui si dispongono ancora poche risorse rispetto alla lingua inglese.
7

Ammari, Ahmad N. "Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns. The Development and Evaluation of New Web Mining Methods that enhance Information Retrieval and improve the Understanding of User¿s Web Behavior in Websites and Social Blogs." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/5269.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The rapid growth of the World Wide Web in the last decade makes it the largest publicly accessible data source in the world, which has become one of the most significant and influential information revolution of modern times. The influence of the Web has impacted almost every aspect of humans' life, activities and fields, causing paradigm shifts and transformational changes in business, governance, and education. Moreover, the rapid evolution of Web 2.0 and the Social Web in the past few years, such as social blogs and friendship networking sites, has dramatically transformed the Web from a raw environment for information consumption to a dynamic and rich platform for information production and sharing worldwide. However, this growth and transformation of the Web has resulted in an uncontrollable explosion and abundance of the textual contents, creating a serious challenge for any user to find and retrieve the relevant information that he truly seeks to find on the Web. The process of finding a relevant Web page in a website easily and efficiently has become very difficult to achieve. This has created many challenges for researchers to develop new mining techniques in order to improve the user experience on the Web, as well as for organizations to understand the true informational interests and needs of their customers in order to improve their targeted services accordingly by providing the products, services and information that truly match the requirements of every online customer. With these challenges in mind, Web mining aims to extract hidden patterns and discover useful knowledge from Web page contents, Web hyperlinks, and Web usage logs. Based on the primary kinds of Web data used in the mining process, Web mining tasks can be categorized into three main types: Web content mining, which extracts knowledge from Web page contents using text mining techniques, Web structure mining, which extracts patterns from the hyperlinks that represent the structure of the website, and Web usage mining, which mines user's Web navigational patterns from Web server logs that record the Web page access made by every user, representing the interactional activities between the users and the Web pages in a website. The main goal of this thesis is to contribute toward addressing the challenges that have been resulted from the information explosion and overload on the Web, by proposing and developing novel Web mining-based approaches. Toward achieving this goal, the thesis presents, analyzes, and evaluates three major contributions. First, the development of an integrated Web structure and usage mining approach that recommends a collection of hyperlinks for the surfers of a website to be placed at the homepage of that website. Second, the development of an integrated Web content and usage mining approach to improve the understanding of the user's Web behavior and discover the user group interests in a website. Third, the development of a supervised classification model based on recent Social Web concepts, such as Tag Clouds, in order to improve the retrieval of relevant articles and posts from Web social blogs.
8

Ammari, Ahmad N. "Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns : the development and evaluation of new Web mining methods that enhance information retrieval and improve the understanding of users' Web behavior in websites and social blogs." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/5269.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The rapid growth of the World Wide Web in the last decade makes it the largest publicly accessible data source in the world, which has become one of the most significant and influential information revolution of modern times. The influence of the Web has impacted almost every aspect of humans' life, activities and fields, causing paradigm shifts and transformational changes in business, governance, and education. Moreover, the rapid evolution of Web 2.0 and the Social Web in the past few years, such as social blogs and friendship networking sites, has dramatically transformed the Web from a raw environment for information consumption to a dynamic and rich platform for information production and sharing worldwide. However, this growth and transformation of the Web has resulted in an uncontrollable explosion and abundance of the textual contents, creating a serious challenge for any user to find and retrieve the relevant information that he truly seeks to find on the Web. The process of finding a relevant Web page in a website easily and efficiently has become very difficult to achieve. This has created many challenges for researchers to develop new mining techniques in order to improve the user experience on the Web, as well as for organizations to understand the true informational interests and needs of their customers in order to improve their targeted services accordingly by providing the products, services and information that truly match the requirements of every online customer. With these challenges in mind, Web mining aims to extract hidden patterns and discover useful knowledge from Web page contents, Web hyperlinks, and Web usage logs. Based on the primary kinds of Web data used in the mining process, Web mining tasks can be categorized into three main types: Web content mining, which extracts knowledge from Web page contents using text mining techniques, Web structure mining, which extracts patterns from the hyperlinks that represent the structure of the website, and Web usage mining, which mines user's Web navigational patterns from Web server logs that record the Web page access made by every user, representing the interactional activities between the users and the Web pages in a website. The main goal of this thesis is to contribute toward addressing the challenges that have been resulted from the information explosion and overload on the Web, by proposing and developing novel Web mining-based approaches. Toward achieving this goal, the thesis presents, analyzes, and evaluates three major contributions. First, the development of an integrated Web structure and usage mining approach that recommends a collection of hyperlinks for the surfers of a website to be placed at the homepage of that website. Second, the development of an integrated Web content and usage mining approach to improve the understanding of the user's Web behavior and discover the user group interests in a website. Third, the development of a supervised classification model based on recent Social Web concepts, such as Tag Clouds, in order to improve the retrieval of relevant articles and posts from Web social blogs.
9

Martínez, Brito Izacar Jesús. "Quantitative structure fate relationships for multimedia environmental analysis." Doctoral thesis, Universitat Rovira i Virgili, 2010. http://hdl.handle.net/10803/8590.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Key physicochemical properties for a wide spectrum of chemical pollutants are unknown. This thesis analyses the prospect of assessing the environmental distribution of chemicals directly from supervised learning algorithms using molecular descriptors, rather than from multimedia environmental models (MEMs) using several physicochemical properties estimated from QSARs. Dimensionless compartmental mass ratios of 468 validation chemicals were compared, in logarithmic units, between: a) SimpleBox 3, a Level III MEM, propagating random property values within statistical distributions of widely recommended QSARs; and, b) Support Vector Regressions (SVRs), acting as Quantitative Structure-Fate Relationships (QSFRs), linking mass ratios to molecular weight and constituent counts (atoms, bonds, functional groups and rings) for training chemicals. Best predictions were obtained for test and validation chemicals optimally found to be within the domain of applicability of the QSFRs, evidenced by low MAE and high q2 values (in air, MAE≤0.54 and q2≥0.92; in water, MAE≤0.27 and q2≥0.92).
Las propiedades fisicoquímicas de un gran espectro de contaminantes químicos son desconocidas. Esta tesis analiza la posibilidad de evaluar la distribución ambiental de compuestos utilizando algoritmos de aprendizaje supervisados alimentados con descriptores moleculares, en vez de modelos ambientales multimedia alimentados con propiedades estimadas por QSARs. Se han comparado fracciones másicas adimensionales, en unidades logarítmicas, de 468 compuestos entre: a) SimpleBox 3, un modelo de nivel III, propagando valores aleatorios de propiedades dentro de distribuciones estadísticas de QSARs recomendados; y, b) regresiones de vectores soporte (SVRs) actuando como relaciones cuantitativas de estructura y destino (QSFRs), relacionando fracciones másicas con pesos moleculares y cuentas de constituyentes (átomos, enlaces, grupos funcionales y anillos) para compuestos de entrenamiento. Las mejores predicciones resultaron para compuestos de test y validación correctamente localizados dentro del dominio de aplicabilidad de los QSFRs, evidenciado por valores bajos de MAE y valores altos de q2 (en aire, MAE≤0.54 y q2≥0.92; en agua, MAE≤0.27 y q2≥0.92).
10

"Robots that Anticipate Pain: Anticipating Physical Perturbations from Visual Cues through Deep Predictive Models." Master's thesis, 2017. http://hdl.handle.net/2286/R.I.44032.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
abstract: To ensure system integrity, robots need to proactively avoid any unwanted physical perturbation that may cause damage to the underlying hardware. In this thesis work, we investigate a machine learning approach that allows robots to anticipate impending physical perturbations from perceptual cues. In contrast to other approaches that require knowledge about sources of perturbation to be encoded before deployment, our method is based on experiential learning. Robots learn to associate visual cues with subsequent physical perturbations and contacts. In turn, these extracted visual cues are then used to predict potential future perturbations acting on the robot. To this end, we introduce a novel deep network architecture which combines multiple sub- networks for dealing with robot dynamics and perceptual input from the environment. We present a self-supervised approach for training the system that does not require any labeling of training data. Extensive experiments in a human-robot interaction task show that a robot can learn to predict physical contact by a human interaction partner without any prior information or labeling. Furthermore, the network is able to successfully predict physical contact from either depth stream input or traditional video input or using both modalities as input.
Dissertation/Thesis
Masters Thesis Computer Science 2017

Books on the topic "Self-Supervised models":

1

Munro, Paul. Self-supervised learning of concepts by single units and "weakly local" representations. Pittsburgh, PA: School of Library and Information Science, University of Pittsburgh, 1988.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Munro, Paul. Self-supervised learning of concepts by single units and "weakly local" representations. School of Library and Information Science, University of Pittsburgh, 1988.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sawarkar, Kunal, and Dheeraj Arremsetty. Deep Learning with PyTorch Lightning: Build and Train High-Performance Artificial Intelligence and Self-Supervised Models Using Python. Packt Publishing, Limited, 2021.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Self-Supervised models":

1

Kordík, Pavel, and Jan Černý. "Self-organization of Supervised Models." In Studies in Computational Intelligence, 179–223. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-20980-2_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Korkmaz, Yilmaz, Tolga Cukur, and Vishal M. Patel. "Self-supervised MRI Reconstruction with Unrolled Diffusion Models." In Lecture Notes in Computer Science, 491–501. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-43999-5_47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Pantazi, Xanthoula Eirini, Dimitrios Moshou, Abdul Mounem Mouazen, Boyan Kuang, and Thomas Alexandridis. "Application of Supervised Self Organising Models for Wheat Yield Prediction." In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 556–65. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-662-44654-6_55.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Su, Jingyu, Chuanhao Li, Chenchen Jing, and Yuwei Wu. "A Self-supervised Strategy for the Robustness of VQA Models." In IFIP Advances in Information and Communication Technology, 290–98. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-03948-5_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Vasylechko, Serge, Onur Afacan, and Sila Kurugol. "Self Supervised Denoising Diffusion Probabilistic Models for Abdominal DW-MRI." In Computational Diffusion MRI, 80–91. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-47292-3_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wagner, Fabian, Mareike Thies, Laura Pfaff, Oliver Aust, Sabrina Pechmann, Daniela Weidner, Noah Maul, et al. "Abstract: Self-supervised CT Dual Domain Denoising using Low-parameter Models." In Bildverarbeitung für die Medizin 2024, 159. Wiesbaden: Springer Fachmedien Wiesbaden, 2024. http://dx.doi.org/10.1007/978-3-658-44037-4_48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Lin, Yankai, Ning Ding, Zhiyuan Liu, and Maosong Sun. "Pre-trained Models for Representation Learning." In Representation Learning for Natural Language Processing, 127–67. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-1600-9_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
AbstractPre-training-fine-tuning has recently become a new paradigm in natural language processing, learning better representations of words, sentences, and documents in a self-supervised manner. Pre-trained models not only unify semantic representations of multiple tasks, multiple languages, and multiple modalities but also emerge high-level capabilities approaching human beings. In this chapter, we introduce pre-trained models for representation learning, from pre-training tasks to adaptation approaches for specific tasks. After that, we discuss several advanced topics toward better pre-trained representations, including better model architecture, multilingual, multi-task, efficient representations, and chain-of-thought reasoning.
8

Gao, Yuting, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun, Ke Li, and Chunhua Shen. "DisCo: Remedying Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning." In Lecture Notes in Computer Science, 237–53. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19809-0_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lúčny, Andrej, Kristína Malinovská, and Igor Farkaš. "Robot at the Mirror: Learning to Imitate via Associating Self-supervised Models." In Artificial Neural Networks and Machine Learning – ICANN 2023, 471–82. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-44207-0_39.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Fessant, F., P. Aknin, L. Oukhellou, and S. Midenet. "Comparison of Supervised Self-Organizing Maps Using Euclidian or Mahalanobis Distance in Classification Context." In Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence, 637–44. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-45720-8_76.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Self-Supervised models":

1

Fini, Enrico, Victor G. Turrisi Da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, and Julien Mairal. "Self-Supervised Models are Continual Learners." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00940.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Pu, Jie, Yuguang Yang, Ruirui Li, Oguz Elibol, and Jasha Droppo. "Scaling Effect of Self-Supervised Speech Models." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-1935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Strgar, Luke, and David Harwath. "Phoneme Segmentation Using Self-Supervised Speech Models." In 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023. http://dx.doi.org/10.1109/slt54892.2023.10022827.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ericsson, Linus, Henry Gouk, and Timothy M. Hospedales. "How Well Do Self-Supervised Models Transfer?" In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021. http://dx.doi.org/10.1109/cvpr46437.2021.00537.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Cho, Gyusang, and Chan-Hyun Youn. "Supervised vs. Self-supervised Pre-trained models for Hand Pose Estimation." In 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 2022. http://dx.doi.org/10.1109/ictc55196.2022.9953011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kurumbudel, Prashanth Ram. "Deep Self-Supervised Learning Models for Automotive Systems." In Symposium on International Automotive Technology. 400 Commonwealth Drive, Warrendale, PA, United States: SAE International, 2021. http://dx.doi.org/10.4271/2021-26-0129.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rosenberg, C., M. Hebert, and H. Schneiderman. "Semi-Supervised Self-Training of Object Detection Models." In 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05). IEEE, 2005. http://dx.doi.org/10.1109/acvmot.2005.107.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tseng, Wei-Cheng, Wei-Tsung Kao, and Hung-yi Lee. "Membership Inference Attacks Against Self-supervised Speech Models." In Interspeech 2022. ISCA: ISCA, 2022. http://dx.doi.org/10.21437/interspeech.2022-11245.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wei, Fangyin, Rohan Chabra, Lingni Ma, Christoph Lassner, Michael Zollhoefer, Szymon Rusinkiewicz, Chris Sweeney, Richard Newcombe, and Mira Slavcheva. "Self-supervised Neural Articulated Shape and Appearance Models." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.01536.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Xu, Puyang, Damianos Karakos, and Sanjeev Khudanpur. "Self-supervised discriminative training of statistical language models." In Understanding (ASRU). IEEE, 2009. http://dx.doi.org/10.1109/asru.2009.5373401.

Full text
APA, Harvard, Vancouver, ISO, and other styles

To the bibliography