To see the other types of publications on this topic, follow the link: Deep supervised learning.

Dissertations / Theses on the topic 'Deep supervised learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Deep supervised learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Tran, Khanh-Hung. "Semi-supervised dictionary learning and Semi-supervised deep neural network." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPASP014.

Full text
Abstract:
Depuis les années 2010, l’apprentissage automatique (ML) est l’un des sujets qui retient beaucoup l'attention des chercheurs scientifiques. De nombreux modèles de ML ont démontré leur capacité produire d’excellent résultats dans des divers domaines comme Vision par ordinateur, Traitement automatique des langues, Robotique… Toutefois, la plupart de ces modèles emploient l’apprentissage supervisé, qui requiert d’un massive annotation. Par conséquent, l’objectif de cette thèse est d’étudier et de proposer des approches semi-supervisées qui ont plusieurs avantages par rapport à l’apprentissage supervisé. Au lieu d’appliquer directement un classificateur semi-supervisé sur la représentation originale des données, nous utilisons plutôt des types de modèle qui intègrent une phase de l’apprentissage de représentation avant de la phase de classification, pour mieux s'adapter à la non linéarité des données. Dans le premier temps, nous revisitons des outils qui permettent de construire notre modèles semi-supervisés. Tout d’abord, nous présentons deux types de modèle qui possèdent l’apprentissage de représentation dans leur architecture : l’apprentissage de dictionnaire et le réseau de neurones, ainsi que les méthodes d’optimisation pour chaque type de model, en plus, dans le cas de réseau de neurones, nous précisons le problème avec les exemples contradictoires. Ensuite, nous présentons les techniques qui accompagnent souvent avec l’apprentissage semi-supervisé comme l’apprentissage de variétés et le pseudo-étiquetage. Dans le deuxième temps, nous travaillons sur l’apprentissage de dictionnaire. Nous synthétisons en général trois étapes pour construire un modèle semi-supervisée à partir d’un modèle supervisé. Ensuite, nous proposons notre modèle semi-supervisée pour traiter le problème de classification typiquement dans le cas d’un faible nombre d’échantillons d’entrainement (y compris tous labellisés et non labellisés échantillons). D'une part, nous appliquons la préservation de la structure de données de l’espace original à l’espace de code parcimonieux (l’apprentissage de variétés), ce qui est considéré comme la régularisation pour les codes parcimonieux. D'autre part, nous intégrons un classificateur semi-supervisé dans l’espace de code parcimonieux. En outre, nous effectuons le codage parcimonieux pour les échantillons de test en prenant en compte aussi la préservation de la structure de données. Cette méthode apporte une amélioration sur le taux de précision par rapport à des méthodes existantes. Dans le troisième temps, nous travaillons sur le réseau de neurones. Nous proposons une approche qui s’appelle "manifold attack" qui permets de renforcer l’apprentissage de variétés. Cette approche est inspirée par l’apprentissage antagoniste : trouver des points virtuels qui perturbent la fonction de coût sur l’apprentissage de variétés (en la maximisant) en fixant les paramètres du modèle; ensuite, les paramètres du modèle sont mis à jour, en minimisant cette fonction de coût et en fixant les points virtuels. Nous fournissons aussi des critères pour limiter l’espace auquel les points virtuels appartiennent et la méthode pour les initialiser. Cette approche apporte non seulement une amélioration sur le taux de précision mais aussi une grande robustesse contre les exemples contradictoires. Enfin, nous analysons des similarités et des différences, ainsi que des avantages et inconvénients entre l’apprentissage de dictionnaire et le réseau de neurones. Nous proposons quelques perspectives sur ces deux types de modèle. Dans le cas de l’apprentissage de dictionnaire semi-supervisé, nous proposons quelques techniques en inspirant par le réseau de neurones. Quant au réseau de neurones, nous proposons d’intégrer "manifold attack" sur les modèles génératifs
Since the 2010's, machine learning (ML) has been one of the topics that attract a lot of attention from scientific researchers. Many ML models have been demonstrated their ability to produce excellent results in various fields such as Computer Vision, Natural Language Processing, Robotics... However, most of these models use supervised learning, which requires a massive annotation. Therefore, the objective of this thesis is to study and to propose semi-supervised learning approaches that have many advantages over supervised learning. Instead of directly applying a semi-supervised classifier on the original representation of data, we rather use models that integrate a representation learning stage before the classification stage, to better adapt to the non-linearity of the data. In the first step, we revisit tools that allow us to build our semi-supervised models. First, we present two types of model that possess representation learning in their architecture: dictionary learning and neural network, as well as the optimization methods for each type of model. Moreover, in the case of neural network, we specify the problem with adversarial examples. Then, we present the techniques that often accompany with semi-supervised learning such as variety learning and pseudo-labeling. In the second part, we work on dictionary learning. We synthesize generally three steps to build a semi-supervised model from a supervised model. Then, we propose our semi-supervised model to deal with the classification problem typically in the case of a low number of training samples (including both labelled and non-labelled samples). On the one hand, we apply the preservation of the data structure from the original space to the sparse code space (manifold learning), which is considered as regularization for sparse codes. On the other hand, we integrate a semi-supervised classifier in the sparse code space. In addition, we perform sparse coding for test samples by taking into account also the preservation of the data structure. This method provides an improvement on the accuracy rate compared to other existing methods. In the third step, we work on neural network models. We propose an approach called "manifold attack" which allows reinforcing manifold learning. This approach is inspired from adversarial learning : finding virtual points that disrupt the cost function on manifold learning (by maximizing it) while fixing the model parameters; then the model parameters are updated by minimizing this cost function while fixing these virtual points. We also provide criteria for limiting the space to which the virtual points belong and the method for initializing them. This approach provides not only an improvement on the accuracy rate but also a significant robustness to adversarial examples. Finally, we analyze the similarities and differences, as well as the advantages and disadvantages between dictionary learning and neural network models. We propose some perspectives on both two types of models. In the case of semi-supervised dictionary learning, we propose some techniques inspired by the neural network models. As for the neural network, we propose to integrate manifold attack on generative models
APA, Harvard, Vancouver, ISO, and other styles
2

Roychowdhury, Soumali. "Supervised and Semi-Supervised Learning in Vision using Deep Neural Networks." Thesis, IMT Alti Studi Lucca, 2019. http://e-theses.imtlucca.it/273/1/Roychowdhury_phdthesis.pdf.

Full text
Abstract:
Deep learning has been a huge success in different vision tasks like classification, object detection, segmentation etc., allowing to start from raw pixels to integrated deep neural models. This thesis aims to solve some of these vision problems using several deep neural network architectures in different ways. The first and the core part of this thesis focuses on a learning framework that extends the previous work on Semantic Based Regularization (SBR) to integrate prior knowledge into deep learners. Deep neural networks are empirical learners and therefore heavily depend on labeled examples, whereas knowledge based learners on the other hand are not very efficient in solving complex vision problems. Therefore, SBR is designed as a semi-supervised framework that can tightly integrate empirical learners with any available background knowledge to get the advantages of learning from both perception and reasoning/knowledge. The framework is learner agnostic and any learning machinery can be used. In the earlier works of SBR, kernel machines or shallow networks were used as learners. The approach of the problem, concept of using multi-task logic functions are borrowed form the previous works of SBR. But for the first time, in this research work, the integration of logic constraints is done with deep neural networks. The thesis defines a novel back propagation schema for optimization of deep neural networks in SBR and also uses several heuristics to integrate convex and concave logic constraints into the deep learners. It also focuses on extensive experimental evaluations performed on multiple image classification datasets to show how the integration of the prior knowledge in deep learners can be used to boost the accuracy of several neural architectures over their individual counterparts. SBR is also used in a video classification problem to automatically annotate surgical and non-surgical tools from videos of cataracts surgery. This framework achieves a high accuracy compared to the human annotators and the state-of-the-art DResSys by enforcing temporal consistency among the consecutive video frames using prior knowledge in deep neural networks through collective classification during the inference time. DResSys, an ensemble of deep convolutional neural networks and a Markov Random Field based framework (CNN-MRF) is used, whereas SBR replaces the MRF graph with logical constraints for enforcing a regularization in the temporal domain. Therefore, SBR and DResSys, two deep learning based frameworks discussed in this thesis, are able to distill prior knowledge into deep neural networks and hence become useful tools for decision support during interoperative cataract surgeries, in report generation, in surgical training etc. Therefore, the first part of the thesis designs scientific frameworks that enable exploiting the wealth of domain knowledge and integrate it with deep convolutional neural networks for solving many real world vision problems and can be used in several industrial applications. In the present world, a range of different businesses possess huge databases with visuals which are difficult to manage and make use of. Since they may not have an effective method to make sense of all the visual data, it might end up uncategorized and useless. If a visual database does not contain meta data about the images or videos, categorizing it, is a huge hassle. Classification of images and videos through useful domain information using these unified frameworks like SBR is a key solution. The second part of the thesis focuses on another vision problem of image segmentation and this part of the thesis is more application-specific. However, it can still be viewed as utilizing some universal and basic domain knowledge techniques with deep learning models. It designs two deep learning based frameworks and makes a head to head comparison of the two approaches in terms of speed, efficiency and cost. The frameworks are built for automatic segmentation and classification of contaminants for cleanliness analysis in automobile, aerospace or manufacturing industries. The frameworks are designed to meet the foremost industry requirement of having an end-to-end solution that is cheap, reliable, fast and accurate in comparison to the traditional techniques presently used in the contaminant analysis and quality control process. These end-to-end solutions when integrated with the simple optical acquisition systems, will help in replacing the expensive slow systems presently existing in the market.
APA, Harvard, Vancouver, ISO, and other styles
3

Geiler, Louis. "Deep learning for churn prediction." Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7333.

Full text
Abstract:
Le problème de la prédiction de l’attrition est généralement réservé aux équipes de marketing. Cependant,grâce aux avancées technologiques, de plus en plus de données peuvent être collectés afin d’analyser le comportement des clients. C’est dans ce cadre que cette thèse s’inscrit, plus particulièrement par l’exploitation des méthodes d’apprentissages automatiques. Ainsi, nous avons commencés par étudier ce problème dans le cadre de l’apprentissage supervisé. Nous avons montré que la combinaison en ensemble de la régression logistique, des forêt aléatoire et de XGBoost offraient les meilleurs résultats en terme d’Aire sous la courbe (Are Under the Curve, AUC). Nous avons également montré que les méthodes du type ré-échantillonage jouent uniquement un rôle local et non pas global.Ensuite, nous avons enrichi nos prédictions en prenant en compte la segmentation des clients. En effet, certains clients peuvent quitter le service à cause d’un coût qu’ils jugent trop élevés ou suite à des difficultés rencontrés avec le service client. Notre approche a été réalisée avec une nouvelle architecture de réseaux de neurones profonds qui exploite à la fois les autoencodeur et l’approche desk-means. De plus, nous nous sommes intéressés à l’apprentissage auto-supervisé dans le cadre tabulaire. Plus précisément, notre architecture s’inspire des travaux autour de l’approche SimCLR en modificant l’architecture mean-teacher du domaine du semi-supervisé. Nous avons montré via la win matrix la supériorité de notre approche par rapport à l’état de l’art. Enfin, nous avons proposé d’appliquer les connaissances acquises au cours de ce travail de thèse dans un cadre industriel, celui de Brigad. Nous avons atténué le problème de l’attrition à l’aide des prédictions issues de l’approche de forêt aléatoire que nous avons optimisés via un grid search et l’optimisation des seuils. Nous avons également proposé une interprétation des résultats avec les méthodes SHAP (SHapley Additive exPlanations)
The problem of churn prediction has been traditionally a field of study for marketing. However, in the wake of the technological advancements, more and more data can be collected to analyze the customers behaviors. This manuscript has been built in this frame, with a particular focus on machine learning. Thus, we first looked at the supervised learning problem. We have demonstrated that logistic regression, random forest and XGBoost taken as an ensemble offer the best results in terms of Area Under the Curve (AUC) among a wide range of traditional machine learning approaches. We also have showcased that the re-sampling approaches are solely efficient in a local setting and not a global one. Subsequently, we aimed at fine-tuning our prediction by relying on customer segmentation. Indeed,some customers can leave a service because of a cost that they deem to high, and other customers due to a problem with the customer’s service. Our approach was enriched with a novel deep neural network architecture, which operates with both the auto-encoders and the k-means approach. Going further, we focused on self-supervised learning in the tabular domain. More precisely, the proposed architecture was inspired by the work on the SimCLR approach, where we altered the architecture with the Mean-Teacher model from semi-supervised learning. We showcased through the win matrix the superiority of our approach with respect to the state of the art. Ultimately, we have proposed to apply what we have built in this manuscript in an industrial setting, the one of Brigad. We have alleviated the company churn problem with a random forest that we optimized through grid-search and threshold optimization. We also proposed to interpret the results with SHAP (SHapley Additive exPlanations)
APA, Harvard, Vancouver, ISO, and other styles
4

Khan, Umair. "Self-supervised deep learning approaches to speaker recognition." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671496.

Full text
Abstract:
In speaker recognition, i-vectors have been the state-of-the-art unsupervised technique over the last few years, whereas x-vectors is becoming the state-of-the-art supervised technique, these days. Recent advances in Deep Learning (DL) approaches to speaker recognition have improved the performance but are constrained to the need of labels for the background data. In practice, labeled background data is not easily accessible, especially when large training data is required. In i-vector based speaker recognition, cosine and Probabilistic Linear Discriminant Analysis (PLDA) are the two basic scoring techniques. Cosine scoring is unsupervised whereas PLDA parameters are typically trained using speaker-labeled background data. This makes a big performance gap between these two scoring techniques. The question is: how to fill this performance gap without using speaker labels for the background data? In this thesis, the above mentioned problem has been addressed using DL approaches without using and/or limiting the use of labeled background data. Three DL based proposals have been made. In the first proposal, a Restricted Boltzmann Machine (RBM) vector representation of speech is proposed for the tasks of speaker clustering and tracking in TV broadcast shows. This representation is referred to as RBM vector. The experiments on AGORA database show that in speaker clustering the RBM vectors gain a relative improvement of 12% in terms of Equal Impurity (EI). For speaker tracking task RBM vectors are used only in the speaker identification part, where the relative improvement in terms of Equal Error Rate (EER) is 11% and 7% using cosine and PLDA scoring, respectively. In the second proposal, DL approaches are proposed in order to increase the discriminative power of i-vectors in speaker verification. We have proposed the use of autoencoder in several ways. Firstly, an autoencoder will be used as a pre-training for a Deep Neural Network (DNN) using a large amount of unlabeled background data. Then, a DNN classifier will be trained using relatively small labeled data. Secondly, an autoencoder will be trained to transform i-vectors into a new representation to increase their discriminative power. The training will be carried out based on the nearest neighbor i-vectors which will be chosen in an unsupervised manner. The evaluation was performed on VoxCeleb-1 database. The results show that using the first system, we gain a relative improvement of 21% in terms of EER, over i-vector/PLDA. Whereas, using the second system, a relative improvement of 42% is gained. If we use the background data in the testing part, a relative improvement of 53% is gained. In the third proposal, we will train a self-supervised end-to-end speaker verification system. The idea is to utilize impostor samples along with the nearest neighbor samples to make client/impostor pairs in an unsupervised manner. The architecture will be based on a Convolutional Neural Network (CNN) encoder, trained as a siamese network with two branch networks. Another network with three branches will also be trained using triplet loss, in order to extract unsupervised speaker embeddings. The experimental results show that both the end-to-end system and the speaker embeddings, despite being unsupervised, show a comparable performance to the supervised baseline. Moreover, their score combination can further improve the performance. The proposed approaches for speaker verification have respective pros and cons. The best result was obtained using the nearest neighbor autoencoder with a disadvantage of relying on background i-vectors in the testing. On the contrary, the autoencoder pre-training for DNN is not bound by this factor but is a semi-supervised approach. The third proposal is free from both these constraints and performs pretty reasonably. It is a self-supervised approach and it does not require the background i-vectors in the testing phase.
Los avances recientes en Deep Learning (DL) para el reconocimiento del hablante están mejorado el rendimiento de los sistemas tradicionales basados en i-vectors. En el reconocimiento de locutor basado en i-vectors, la distancia coseno y el análisis discriminante lineal probabilístico (PLDA) son las dos técnicas más usadas de puntuación. La primera no es supervisada, pero la segunda necesita datos etiquetados por el hablante, que no son siempre fácilmente accesibles en la práctica. Esto crea una gran brecha de rendimiento entre estas dos técnicas de puntuación. La pregunta es: ¿cómo llenar esta brecha de rendimiento sin usar etiquetas del hablante en los datos de background? En esta tesis, el problema anterior se ha abordado utilizando técnicas de DL sin utilizar y/o limitar el uso de datos etiquetados. Se han realizado tres propuestas basadas en DL. En la primera, se propone una representación vectorial de voz basada en la máquina de Boltzmann restringida (RBM) para las tareas de agrupación de hablantes y seguimiento de hablantes en programas de televisión. Los experimentos en la base de datos AGORA, muestran que en agrupación de hablantes los vectores RBM suponen una mejora relativa del 12%. Y, por otro lado, en seguimiento del hablante, los vectores RBM,utilizados solo en la etapa de identificación del hablante, muestran una mejora relativa del 11% (coseno) y 7% (PLDA). En la segunda, se utiliza DL para aumentar el poder discriminativo de los i-vectors en la verificación del hablante. Se ha propuesto el uso del autocodificador de varias formas. En primer lugar, se utiliza un autocodificador como preentrenamiento de una red neuronal profunda (DNN) utilizando una gran cantidad de datos de background sin etiquetar, para posteriormente entrenar un clasificador DNN utilizando un conjunto reducido de datos etiquetados. En segundo lugar, se entrena un autocodificador para transformar i-vectors en una nueva representación para aumentar el poder discriminativo de los i-vectors. El entrenamiento se lleva a cabo en base a los i-vectors vecinos más cercanos, que se eligen de forma no supervisada. La evaluación se ha realizado con la base de datos VoxCeleb-1. Los resultados muestran que usando el primer sistema obtenemos una mejora relativa del 21% sobre i-vectors, mientras que usando el segundo sistema, se obtiene una mejora relativa del 42%. Además, si utilizamos los datos de background en la etapa de prueba, se obtiene una mejora relativa del 53%. En la tercera, entrenamos un sistema auto-supervisado de verificación de locutor de principio a fin. Utilizamos impostores junto con los vecinos más cercanos para formar pares cliente/impostor sin supervisión. La arquitectura se basa en un codificador de red neuronal convolucional (CNN) que se entrena como una red siamesa con dos ramas. Además, se entrena otra red con tres ramas utilizando la función de pérdida triplete para extraer embeddings de locutores. Los resultados muestran que tanto el sistema de principio a fin como los embeddings de locutores, a pesar de no estar supervisados, tienen un rendimiento comparable a una referencia supervisada. Cada uno de los enfoques propuestos tienen sus pros y sus contras. El mejor resultado se obtuvo utilizando el autocodificador con el vecino más cercano, con la desventaja de que necesita los i-vectors de background en el test. El uso del preentrenamiento del autocodificador para DNN no tiene este problema, pero es un enfoque semi-supervisado, es decir, requiere etiquetas de hablantes solo de una parte pequeña de los datos de background. La tercera propuesta no tienes estas dos limitaciones y funciona de manera razonable. Es un en
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Kun. "Supervised and Self-Supervised Learning for Video Object Segmentation in the Compressed Domain." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/29361.

Full text
Abstract:
Video object segmentation has attracted remarkable attention since it is more and more critical in real video understanding scenarios. Raw videos have very high redundancies. Therefore, using a heavy backbone network to extract features from all individual frames may be a waste of time. Also, the motion vectors and residuals in compressed videos provide motion information that can be utilized directly. Therefore, this thesis will discuss semi-supervised video object segmentation methods working directly on compressed videos. First, we discuss a supervised learning method for semi-supervised video object segmentation on compressed videos. To reduce the running time of the model, we design to only use a heavy backbone network for several keyframes. We then employ a much more lightweight network to extract the features for other frames. This operation saves both training and inference time and eliminates redundant information. To our best knowledge, the proposed approach is the fastest video object segmentation model up to now. Second, we explore a self-supervised learning approach for semi-supervised video object segmentation in the compressed domain. Deep neural networks usually need a massive amount of labeled data to train. However, we can obtain countless image data and video data at almost zero cost. This thesis presents a deep learning-based motion-aware matching approach for semi-supervised video object segmentation using self-supervised learning. A compelling new reconstruction loss is also designed, which is computed between motion information to improve the model's effectiveness. Experimental results on two public video object segmentation datasets show that our proposed models for the two tasks are efficient and effective, indicating that video object segmentation in the compressed domain is a potential research direction.
APA, Harvard, Vancouver, ISO, and other styles
6

Liu, Dongnan. "Supervised and Unsupervised Deep Learning-based Biomedical Image Segmentation." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/24744.

Full text
Abstract:
Biomedical image analysis plays a crucial role in the development of healthcare, with a wide scope of applications including the disease diagnosis, clinical treatment, and future prognosis. Among various biomedical image analysis techniques, segmentation is an essential step, which aims at assigning each pixel with labels of interest on the category and instance. At the early stage, the segmentation results were obtained via manual annotation, which is time-consuming and error-prone. Over the past few decades, hand-craft feature based methods have been proposed to segment the biomedical images automatically. However, these methods heavily rely on prior knowledge, which limits their generalization ability on various biomedical images. With the recent advance of the deep learning technique, convolutional neural network (CNN) based methods have achieved state-of-the-art performance on various nature and biomedical image segmentation tasks. The great success of the CNN based segmentation methods results from the ability to learn contextual and local information from the high dimensional feature space. However, the biomedical image segmentation tasks are particularly challenging, due to the complicated background components, the high variability of object appearances, numerous overlapping objects, and ambiguous object boundaries. To this end, it is necessary to establish automated deep learning-based segmentation paradigms, which are capable of processing the complicated semantic and morphological relationships in various biomedical images. In this thesis, we propose novel deep learning-based methods for fully supervised and unsupervised biomedical image segmentation tasks. For the first part of the thesis, we introduce fully supervised deep learning-based segmentation methods on various biomedical image analysis scenarios. First, we design a panoptic structure paradigm for nuclei instance segmentation in the histopathology images, and cell instance segmentation in the fluorescence microscopy images. Traditional proposal-based and proposal-free instance segmentation methods are only capable to leverage either global contextual or local instance information. However, our panoptic paradigm integrates both of them and therefore achieves better performance. Second, we propose a multi-level feature fusion architecture for semantic neuron membrane segmentation in the electron microscopy (EM) images. Third, we propose a 3D anisotropic paradigm for brain tumor segmentation in magnetic resonance images, which enlarges the model receptive field while maintaining the memory efficiency. Although our fully supervised methods achieve competitive performance on several biomedical image segmentation tasks, they heavily rely on the annotations of the training images. However, labeling pixel-level segmentation ground truth for biomedical images is expensive and labor-intensive. Subsequently, exploring unsupervised segmentation methods without accessing annotations is an important topic for biomedical image analysis. In the second part of the thesis, we focus on the unsupervised biomedical image segmentation methods. First, we proposed a panoptic feature alignment paradigm for unsupervised nuclei instance segmentation in the histopathology images, and mitochondria instance segmentation in EM images. To the best of our knowledge, we are for the first time to design an unsupervised deep learning-based method for various biomedical image instance segmentation tasks. Second, we design a feature disentanglement architecture for unsupervised object recognition. In addition to the unsupervised instance segmentation for the biomedical images, our method also achieves state-of-the-art performance on the unsupervised object detection for natural images, which further demonstrates its effectiveness and high generalization ability.
APA, Harvard, Vancouver, ISO, and other styles
7

Han, Kun. "Supervised Speech Separation And Processing." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1407865723.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Nasrin, Mst Shamima. "Pathological Image Analysis with Supervised and Unsupervised Deep Learning Approaches." University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1620052562772676.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Karlsson, Erik, and Gilbert Nordhammar. "Naive semi-supervised deep learning med sammansättning av pseudo-klassificerare." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-17177.

Full text
Abstract:
Ett vanligt problem inom supervised learning är brist på taggad träningsdata. Naive semi-supervised deep learning är en träningsteknik som ämnar att mildra detta problem genom att generera pseudo-taggad data och därefter låta ett neuralt nätverk träna på denna samt en mindre mängd taggad data. Detta arbete undersöker om denna teknik kan förbättras genom användandet av röstning. Flera neurala nätverk tränas genom den framtagna tekniken, naive semi-supervised deep learning eller supervised learning och deras träffsäkerhet utvärderas därefter. Resultaten visade nästan enbart försämringar då röstning användes. Dock verkar inte förutsättningarna för röstning ha varit särskilt goda, vilket gör det svårt att dra en säker slutsats kring effekterna av röstning. Även om röstning inte gav förbättringar har NSSDL visat sig vara mycket effektiv. Det finns flera applikationsområden där tekniken i framtiden skulle kunna användas med goda resultat.
APA, Harvard, Vancouver, ISO, and other styles
10

Örnberg, Oscar. "Semi-Supervised Methods for Classification of Hyperspectral Images with Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288726.

Full text
Abstract:
Hyperspectral images (HSI) can reveal more patterns than regular images. The dimensionality is high with a wider spectrum for each pixel. Few labeled datasets exists while unlabeled data is abundant. This makes semi-supervised learning well suited for HSI classification. Leveraging new research in deep learning and semi-supervised methods, two models called FixMatch and Mean Teacher was adapted to gauge the effectiveness of consistency regularization methods for semi-supervised learning on HSI classification. Traditional machine learning methods such as SVM, Random Forest and XGBoost was compared in conjunction with two semi-supervised machine learning methods, TSVM and QN-S3VM, as baselines. The semi-supervised deep learning models was tested with two networks, a 3D and 1D CNN. To enable the use of consistency regularization several new data augmentation methods was adapted to the HSI data. Current methods are few and most rely on labeled data, which is not available in this setting. The data augmentation methods presented proved useful and was adapted in a automatic augmentation scheme. The accuracy of the baseline and semi-supervised methods showed that the SVM was best in all cases. Neither semi-supervised method showed consistently better performance than their supervised equivalent.
Hyperspektrala bilder (HSI) kan avslöja fler mönster än vanliga bilder. Dimensionaliteten är hög med ett bredare spektrum för varje pixel. Få dataset som är etiketter finns, medan rådata finns i överflöd. Detta gör att semi-vägledd inlärning är väl anpassad för HSI klassificering. Genom att utnyttja nya rön inom djupinlärning och semi-vägledda methods, två modeller kallade FixMatch och Mean Teacher adapterades för att mäta effektiviteten hos konsekvens regularisering metoder inom semi-vägledd inlärning på HSI klassifikation. Traditionella maskininlärnings metoder så som SVM, Random Forest och XGBoost jämfördes i samband med two semi-vägledda maskininlärnings metoder, TSVM och QN-S3VM, som basnivå. De semi-vägledda djupinlärnings metoderna testades med två olika nätverk, en 3D och 1D CNN. För att kunna använda konsekvens regularisering, flera nya data augmenterings metoder adapterades till HSI data. Nuvarande metoder är få och förlitar sig på att datan har etiketter, vilket inte är tillgängligt i detta scenariot. Data augmenterings metoderna som presenterades visade sig vara användbara och adapterades i ett automatiskt augmenteringssystem. Noggrannheten av basnivå och de semi-vägledda metoderna visade att SVM var bäst i alla fall. Ingen av de semi-vägledda metoderna visade konsekvent bättre resultat än deras vägledda motsvarigheter.
APA, Harvard, Vancouver, ISO, and other styles
11

Bonechi, Simone. "Lack of Supervised Data: A Deep Learning Approach in Image Analysis." Doctoral thesis, Università di Siena, 2020. http://hdl.handle.net/11365/1105761.

Full text
Abstract:
A fundamental key-point for the recent success of deep learning models is the availability of large sets of annotated data. The scarcity of labeled data is often a significant obstacle for real-world applications, where annotations are inherently difficult and expensive to be obtained. This is particularly true for semantic segmentation, which requires pixel-level annotations. Indeed, dealing with a reduced number of fully annotated data is one of the most active research field in deep learning and different strategies are commonly employed to cope with the lack of annotations. In this thesis, we consider a weakly supervised approach to semantic segmentation. In particular, we propose a framework that can be used to generate pixel-level annotations exploiting bounding-box supervisions. Indeed, The bounding-box supervision, even though less accurate, is a valuable alternative, effective in reducing the dataset collection costs. The proposed method is based on a two-stage procedure. Firstly, a deep neural network is trained to distinguish the relevant object from the background inside a given bounding-box. Then, the same network is applied to bounding-boxes extracted from an object detection dataset and its outputs are employed in order to generate the weak pixel-level supervision for the original imageootnote{In our framework, we define The proposed approach has been tested on two different tasks. In particular, the Pascal-VOC dataset has been used to asses the quality of the proposed framework, obtaining results comparable with the state-of-the-art weakly supervised approaches. Then, we have employed the proposed method in scene text segmentation, where pixel-level annotations are very scarce. With our framework, exploiting the bounding-box supervisions of COCO-Text and MLT datasets, we have generated and released two datasets of real images with weak pixel-level supervisions (COCO_TS and MLT_S). These supervisions have been used to train deep segmentation networks and the experiments show that COCO_TS and MLT_S are a valid alternative to the use of synthetic images, which is the standard approach applied for pre-training a scene text segmentation network. Furthermore, when a network is trained in absence of labeled data, another main issue to be faced is the validation of the model. Therefore, we also propose some confidence measures that can be used to evaluate a trained network in absence of annotated samples. Such measures are employed in a domain adaptation framework - based on generative adversarial networks (GANs) - which aims at training a model on a source domain with labeled data capable to generalize on a target unlabeled dataset. Confidence measures proved to be correlated with the real accuracy of the model. The experiments, carried out on two domain adaptation tasks (SVHN to MNIST and CIFAR to STL), show that the proposed measures can be used both to estimate the performance of the trained model and to properly stop the GANs training.
APA, Harvard, Vancouver, ISO, and other styles
12

Rastgoufard, Rastin. "Multi-Label Latent Spaces with Semi-Supervised Deep Generative Models." ScholarWorks@UNO, 2018. https://scholarworks.uno.edu/td/2486.

Full text
Abstract:
Expert labeling, tagging, and assessment are far more costly than the processes of collecting raw data. Generative modeling is a very powerful tool to tackle this real-world problem. It is shown here how these models can be used to allow for semi-supervised learning that performs very well in label-deficient conditions. The foundation for the work in this dissertation is built upon visualizing generative models' latent spaces to gain deeper understanding of data, analyze faults, and propose solutions. A number of novel ideas and approaches are presented to improve single-label classification. This dissertation's main focus is on extending semi-supervised Deep Generative Models for solving the multi-label problem by proposing unique mathematical and programming concepts and organization. In all naive mixtures, using multiple labels is detrimental and causes each label's predictions to be worse than models that utilize only a single label. Examining latent spaces reveals that in many cases, large regions in the models generate meaningless results. Enforcing a priori independence is essential, and only when applied can multi-label models outperform the best single-label models. Finally, a novel learning technique called open-book learning is described that is capable of surpassing the state-of-the-art classification performance of generative models for multi-labeled, semi-supervised data sets.
APA, Harvard, Vancouver, ISO, and other styles
13

Huang, Jiajun. "Learning to Detect Compressed Facial Animation Forgery Data with Contrastive Learning." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/29183.

Full text
Abstract:
The facial forgery generation, which could be utilised to modify facial attributes, is the critical threat to digital society. The recent Deep Nerual Network based forgery generation methods, called Deepfake, can generate high quality results that are hard to be distinguished by human eyes. Various detection methods and datasets are proposed for detecting such data. However, recent research less considers facial animation, which is also important in the forgery attack side. It tries to animate face images with actions provided by driving videos. Our experiments show that the existed datasets are not sufficient to develop reliable detection methods for animation data. As a response, we propose a facial animation dataset, called DeepFake MNIST+. It includes 10,000 facial animation videos in 10 different actions. We also provide a baseline detection method and the comprehensive analysis of the method and dataset. Meanwhile, we notice that the data compression process could affect the detection performance. Thus creating a forgery detection model that can handle the data compressed with unknown levels is critical. To enhance the performance for such models, we consider the weak and strong compressed data as two views of the original data and they should have similar relationships with other samples. We propose a novel anti-compression forgery detection framework by maintaining closer relations within data under different compression levels. Specifically, the algorithm measures the pair-wise similarity within data as the relations and forcing the relations of weak and strong compressed data close to each other, thus improving the performance for detecting strong compressed data. To achieve a better strong compressed data relation guided by the less compressed one, we apply video level contrastive learning for weak compressed data. The experiment results show the proposed algorithm could adapt to multiple compression levels well.
APA, Harvard, Vancouver, ISO, and other styles
14

Feng, Zeyu. "Learning Deep Representations from Unlabelled Data for Visual Recognition." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/26876.

Full text
Abstract:
Self-supervised learning (SSL) aims at extracting from abundant unlabelled images transferable semantic features, which benefit various downstream visual tasks by reducing the sample complexity when human annotated labels are scarce. SSL is promising because it also boosts performance in diverse tasks when combined with the knowledge of existing techniques. Therefore, it is important and meaningful to study how SSL leads to better transferability and design novel SSL methods. To this end, this thesis proposes several methods to improve SSL and its function in downstream tasks. We begin by investigating the effect of unlabelled training data, and introduce an information-theoretical constraint for SSL from multiple related domains. In contrast to conventional single dataset, exploiting multi-domains has the benefits of decreasing the build-in bias of individual domain and allowing knowledge transfer across domains. Thus, the learned representation is more unbiased and transferable. Next, we describe a feature decoupling (FD) framework that incorporates invariance into predicting transformations, one main category of SSL methods, by observing that they often lead to co-variant features unfavourable for transfer. Our model learns a split representation that contains both transformation related and unrelated parts. FD achieves SOTA results on SSL benchmarks. We also present a multi-task method with theoretical understanding for contrastive learning, the other main category of SSL, by leveraging the semantic information from synthetic images to facilitate the learning of class-related semantics. Finally, we explore self-supervision in open-set unsupervised classification with the knowledge of source domain. We propose to enforce consistency under transformation of target data and discover pseudo-labels from confident predictions. Experimental results outperform SOTA open-set domain adaptation methods.
APA, Harvard, Vancouver, ISO, and other styles
15

Song, Shiping. "Study of Semi-supervised Deep Learning Methods on Human Activity Recognition Tasks." Thesis, KTH, Robotik, perception och lärande, RPL, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-241366.

Full text
Abstract:
This project focuses on semi-supervised human activity recognition (HAR) tasks, in which the inputs are partly labeled time series data acquired from sensors such as accelerometer data, and the outputs are predefined human activities. Most state-of-the-art existing work in HAR area is supervised now, which relies on fully labeled datasets. Since the cost to label the collective instances increases fast with the increasing scale of data, semi-supervised methods are now widely required. This report proposed two semi-supervised methods and then investigated how well they perform on a partly labeled dataset, comparing to the state-of-the-art supervised method. One of these methods is designed based on the state-of-the-art supervised method, Deep-ConvLSTM, together with the semi-supervised learning concepts, self-training. Another one is modified based on a semi-supervised deep learning method, LSTM initialized by seq2seq autoencoder, which is firstly introduced for natural language processing. According to the experiments on a published dataset (Opportunity Activity Recognition dataset), both of these semi-supervised methods have better performance than the state-of-the-art supervised methods.
Detta projekt fokuserar på halvövervakad Human Activity Recognition (HAR), där indata delvis är märkta tidsseriedata från sensorer som t.ex. accelerometrar, och utdata är fördefinierade mänskliga aktiviteter. De främsta arbetena inom HAR-området använder numera övervakade metoder, vilka bygger på fullt märkta dataset. Eftersom kostnaden för att märka de samlade instanserna ökar snabbt med den ökade omfattningen av data, föredras numera ofta halvövervakade metoder. I denna rapport föreslås två halvövervakade metoder och det undersöks hur bra de presterar på ett delvis märkt dataset jämfört med den moderna övervakade metoden. En av dessa metoder utformas baserat på en högkvalitativ övervakad metod, DeepConvLSTM, kombinerad med självutbildning. En annan metod baseras på en halvövervakad djupinlärningsmetod, LSTM, initierad av seq2seq autoencoder, som först införs för behandling av naturligt språk. Enligt experimenten på ett publicerat dataset (Opportunity Activity Recognition dataset) har båda dessa metoder bättre prestanda än de toppmoderna övervakade metoderna.
APA, Harvard, Vancouver, ISO, and other styles
16

Meng, Zhaoxin. "A deep learning model for scene recognition." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-36491.

Full text
Abstract:
Scene recognition is a hot research topic in the field of image recognition. It is necessary that we focus on the research on scene recognition, because it is helpful to the scene understanding topic, and can provide important contextual information for object recognition. The traditional approaches for scene recognition still have a lot of shortcomings. In these years, the deep learning method, which uses convolutional neural network, has got state-of-the-art results in this area. This thesis constructs a model based on multi-layer feature extraction of CNN and transfer learning for scene recognition tasks. Because scene images often contain multiple objects, there may be more useful local semantic information in the convolutional layers of the network, which may be lost in the full connected layers. Therefore, this paper improved the traditional architecture of CNN, adopted the existing improvement which enhanced the convolution layer information, and extracted it using Fisher Vector. Then this thesis introduced the idea of transfer learning, and tried to introduce the knowledge of two different fields, which are scene and object. We combined the output of these two networks to achieve better results. Finally, this thesis implemented the method using Python and PyTorch. This thesis applied the method to two famous scene datasets. the UIUC-Sports and Scene-15 datasets. Compared with traditional CNN AlexNet architecture, we improve the result from 81% to 93% in UIUC-Sports, and from 79% to 91% in Scene- 15. It shows that our method has good performance on scene recognition tasks.
APA, Harvard, Vancouver, ISO, and other styles
17

Álvarez, Robles Enrique Josué. "Supervised Learning models with ice hockey data." Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-167718.

Full text
Abstract:
The technology developments of the last years allow measuring data in almost every field and area nowadays, especially increasing the potential for analytics in branches in which not much analytics have been done due to complicated data access before. The increased number of interest in sports analytics is highly connected to the better technology now available for visual and physical sensors on the one hand and sports as upcoming economic topic holding potentially large revenues and therefore investing interest on the other hand. With the underlying database, precise strategies and individual performance improvements within the field of professional sports are no longer a question of (coach)experience but can be derived from models with statistical accuracy. This thesis aims to evaluate if the available data together with complex and simple supervised machine learning models could generalize from the training data to unseen situations by evaluating performance metrics. Data from games of the ice hockey team of Linköping for the season 2017/2018 is processed with supervised learning algorithms such as binary logistic regression and neural networks. The result of this first step is to determine the strategies of passes by considering both, attempted but failed and successful shots on goals during the game. For that, the original, raw data set was aggregated to game-specific data. After having detected the distinct strategies, they are classified due to their rate of success.
APA, Harvard, Vancouver, ISO, and other styles
18

Dabiri, Sina. "Semi-Supervised Deep Learning Approach for Transportation Mode Identification Using GPS Trajectory Data." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/86845.

Full text
Abstract:
Identification of travelers' transportation modes is a fundamental step for various problems that arise in the domain of transportation such as travel demand analysis, transport planning, and traffic management. This thesis aims to identify travelers' transportation modes purely based on their GPS trajectories. First, a segmentation process is developed to partition a user's trip into GPS segments with only one transportation mode. A majority of studies have proposed mode inference models based on hand-crafted features, which might be vulnerable to traffic and environmental conditions. Furthermore, the classification task in almost all models have been performed in a supervised fashion while a large amount of unlabeled GPS trajectories has remained unused. Accordingly, a deep SEmi-Supervised Convolutional Autoencoder (SECA) architecture is proposed to not only automatically extract relevant features from GPS segments but also exploit useful information in unlabeled data. The SECA integrates a convolutional-deconvolutional autoencoder and a convolutional neural network into a unified framework to concurrently perform supervised and unsupervised learning. The two components are simultaneously trained using both labeled and unlabeled GPS segments, which have already been converted into an efficient representation for the convolutional operation. An optimum schedule for varying the balancing parameters between reconstruction and classification errors are also implemented. The performance of the proposed SECA model, trip segmentation, the method for converting a raw trajectory into a new representation, the hyperparameter schedule, and the model configuration are evaluated by comparing to several baselines and alternatives for various amounts of labeled and unlabeled data. The experimental results demonstrate the superiority of the proposed model over the state-of-the-art semi-supervised and supervised methods with respect to metrics such as accuracy and F-measure.
Master of Science
Identifying users' transportation modes (e.g., bike, bus, train, and car) is a key step towards many transportation related problems including (but not limited to) transport planning, transit demand analysis, auto ownership, and transportation emissions analysis. Traditionally, the information for analyzing travelers' behavior for choosing transport mode(s) was obtained through travel surveys. High cost, low-response rate, time-consuming manual data collection, and misreporting are the main demerits of the survey-based approaches. With the rapid growth of ubiquitous GPS-enabled devices (e.g., smartphones), a constant stream of users' trajectory data can be recorded. A user's GPS trajectory is a sequence of GPS points, recorded by means of a GPS-enabled device, in which a GPS point contains the information of the device geographic location at a particular moment. In this research, users' GPS trajectories, rather than traditional resources, are harnessed to predict their transportation mode by means of statistical models. With respect to the statistical models, a wide range of studies have developed travel mode detection models using on hand-designed attributes and classical learning techniques. Nonetheless, hand-crafted features cause some main shortcomings including vulnerability to traffic uncertainties and biased engineering justification in generating effective features. A potential solution to address these issues is by leveraging deep learning frameworks that are capable of capturing abstract features from the raw input in an automated fashion. Thus, in this thesis, deep learning architectures are exploited in order to identify transport modes based on only raw GPS tracks. It is worth noting that a significant portion of trajectories in GPS data might not be annotated by a transport mode and the acquisition of labeled data is a more expensive and labor-intensive task in comparison with collecting unlabeled data. Thus, utilizing the unlabeled GPS trajectory (i.e., the GPS trajectories that have not been annotated by a transport mode) is a cost-effective approach for improving the prediction quality of the travel mode detection model. Therefore, the unlabeled GPS data are also leveraged by developing a novel deep-learning architecture that is capable of extracting information from both labeled and unlabeled data. The experimental results demonstrate the superiority of the proposed models over the state-of-the-art methods in literature with respect to several performance metrics.
APA, Harvard, Vancouver, ISO, and other styles
19

Varshney, Varun. "Supervised and unsupervised learning for plant and crop row detection in precision agriculture." Thesis, Kansas State University, 2017. http://hdl.handle.net/2097/35463.

Full text
Abstract:
Master of Science
Department of Computing and Information Sciences
William H. Hsu
The goal of this research is to present a comparison between different clustering and segmentation techniques, both supervised and unsupervised, to detect plant and crop rows. Aerial images, taken by an Unmanned Aerial Vehicle (UAV), of a corn field at various stages of growth were acquired in RGB format through the Agronomy Department at the Kansas State University. Several segmentation and clustering approaches were applied to these images, namely K-Means clustering, Excessive Green (ExG) Index algorithm, Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and a deep learning approach based on Fully Convolutional Networks (FCN), to detect the plants present in the images. A Hough Transform (HT) approach was used to detect the orientation of the crop rows and rotate the images so that the rows became parallel to the x-axis. The result of applying different segmentation methods to the images was then used in estimating the location of crop rows in the images by using a template creation method based on Green Pixel Accumulation (GPA) that calculates the intensity profile of green pixels present in the images. Connected component analysis was then applied to find the centroids of the detected plants. Each centroid was associated with a crop row, and centroids lying outside the row templates were discarded as being weeds. A comparison between the various segmentation algorithms based on the Dice similarity index and average run-times is presented at the end of the work.
APA, Harvard, Vancouver, ISO, and other styles
20

Sahasrabudhe, Mihir. "Unsupervised and weakly supervised deep learning methods for computer vision and medical imaging." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASC010.

Full text
Abstract:
Les premières contributions de cette thèse (Chapter 2 et Chapitre 3) sont des modèles appelés Deforming Autoencoder (DAE) et Lifting Autoencoder (LAE), utilisés pour l'apprentissage non-supervisé de l'alignement 2-D dense d'images d'une classe donnée, et à partir de cela, pour apprendre un modèle tridimensionnel de l'objet. Ces modèles sont capable d'identifer des espaces canoniques pour représenter de différent caractéristiques de l'objet, à savoir, l'apparence des objets dans l'espace canonique, la déformation dense associée permettant de retrouver l'image réelle à partir de cette apparence, et pour le cas des visages humains, le modèle 3-D propre au visage de la personne considérée, son expression faciale, et l'angle de vue de la caméra. De plus, nous illustrons l'application de DAE à d'autres domaines, à savoir, l'alignement d'IRM de poumons et d'images satellites. Dans le Chapitre 4, nous nous concentrons sur une problématique lié au cancer du sang-diagnostique d'hyperlymphocytosis. Nous proposons un modèle convolutif pour encoder les images appartenant à un patient, suivi par la concaténation de l'information contenue dans toutes les images. Nos résultats montrent que les modèles proposés sont de performances comparables à celles des biologistes, et peuvent dont les aider dans l'élaboration de leur diagnostique
The first two contributions of this thesis (Chapter 2 and 3) are models for unsupervised 2D alignment and learning 3D object surfaces, called Deforming Autoencoders (DAE) and Lifting Autoencoders (LAE). These models are capable of identifying canonical space in order to represent different object properties, for example, appearance in a canonical space, deformation associated with this appearance that maps it to the image space, and for human faces, a 3D model for a face, its facial expression, and the angle of the camera. We further illustrate applications of models to other domains_ alignment of lung MRI images in medical image analysis, and alignment of satellite images for remote sensing imagery. In Chapter 4, we concentrate on a problem in medical image analysis_ diagnosis of lymphocytosis. We propose a convolutional network to encode images of blood smears obtained from a patient, followed by an aggregation operation to gather information from all images in order to represent them in one feature vector which is used to determine the diagnosis. Our results show that the performance of the proposed models is at-par with biologists and can therefore augment their diagnosis
APA, Harvard, Vancouver, ISO, and other styles
21

Rönnberg, Axel. "Semi-Supervised Deep Learning using Consistency-Based Methods for Segmentation of Medical Images." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279579.

Full text
Abstract:
In radiation therapy, a form of cancer treatment, accurately locating the anatomical structures is required in order to limit the impact on healthy cells. The automatic task of delineating these structures and organs is called segmentation, where each pixel in an image is classified and assigned a label. Recently, deep neural networks have proven to be efficient at automatic medical segmentation. However, deep learning requires large amounts of training data. This is a restricting feature, especially in the medical field due to factors such as patient confidentiality. Nonetheless, the main challenge is not the image data itself but the lack of high-quality annotations. It is thus interesting to investigate methods for semi-supervised learning, where only a subset of the images re- quires annotations. This raises the question if these methods can be acceptable for organ segmentation, and if they will result in an increased performance in comparison to supervised models. A category of semi-supervised methods applies the strategy of encouraging consistency between predictions. Consistency Training and Mean Teacher are two methods in which the network weights are updated in order to minimize the impact of input perturbations such as data augmentations. In addition, the Mean Teacher method trains two models, a Teacher and a Student. The Teacher is updated as an average of consecutive Student models, using Temporal Ensembling. To resolve the question whether semi-supervised learning could be beneficial, the two mentioned techniques are investigated. They are used in training deep neural networks with an U-net architecture to segment the bladder and anorectum in 3D CT images. The results showed signs of promise for Consistency Training and Mean Teacher, with nearly all model configurations having improved segmentations. Results also showed that the methods caused a reduction in performance variance, primarily by limiting poor delineations. With these results in hand, the use of semi-supervised learning should definitely be considered. However, since the segmentation improvement was not repeated in all experiment configurations, more research needs to be done.
Inom radioterapi, en form av cancerbehandling, är precis lokalisering av anatomiska strukturer nödvändig för att begränsa påverkan på friska celler. Det automatiska arbetet att avbilda de här strukturerna och organen kallas för segmentering, där varje pixel i en bild är klassificerad och anvisad en etikett. Nyligen har djupa neurala nätverk visat sig vara effektiva för automatisk, medicinsk segmentering. Emellertid kräver djupinlärning stora mängder tränings- data. Det är ett begränsande drag, speciellt i det medicinska fältet, på grund av faktorer som patientsekretess. Trots det är den stora utmaningen inte bilddatan själv, utan bristen på högkvalitativa annoteringar. Det är därför intressant att undersöka metoder för semi-övervakad inlärning, där endast en delmängd av bilderna behöver annoteringar. Det höjer frågan om de här metoderna kan vara kliniskt acceptabla för organsegmentering, och om de resulterar i en ökad prestanda i jämförelse med övervakade modeller. En kategori av semi-övervakade metoder applicerar strategin att uppmuntra konsistens mellan prediktioner. Consistency Training och Mean Teacher är två metoder där nätverkets vikter är uppdaterade så att påverkan av rubbningar av input, som dataökningar, minimeras. Därtill tränar Mean Teacher två modeller, en Lärare och en Student. Läraren uppdateras som ett genomsnitt av konsekutiva Studentmodeller, användandes av Temporal Ensembling. För att lösa frågan huruvida semi-övervakad inlärning kan vara fördelaktig är de två nämnda metoderna undersökta. De används för att träna djupa neurala nät- verk med en U-net arkitektur för att segmentera blåsan och anorektum i 3D CT-bilder. Resultaten visade tecken på potential för Consistency Training och Mean Teacher, med förbättrad segmentering för nästan alla modellkonfigurationer. Resultaten visade även att metoderna medförde en reduktion i varians av prestanda, främst genom att begränsa dåliga segmenteringar. I och med de här resultaten borde användandet av semi-övervakad inlärning övervägas. Emellertid behöver mer forskning utföras, då förbättringen av segmenteringen inte upprepades i alla experiment.
APA, Harvard, Vancouver, ISO, and other styles
22

Torcinovich, Alessandro <1992&gt. "Using Contextual Information In Weakly Supervised Learning: Toward the integration of contextual and deep learningapproaches, to address weakly supervised tasks." Doctoral thesis, Università Ca' Foscari Venezia, 2021. http://hdl.handle.net/10579/20596.

Full text
Abstract:
Come l'attento lettore avrà dedotto dal titolo, questa tesi pone alcune basi empiriche, assieme ad altrettante considerazioni teoriche, verso la definizione di una metodologia finalizzata a migliorare task di weakly supervised learning. La metodologia genera supervisione addizionale sfruttando l'informazione contestuale proveniente dal confronto delle osservazioni in un dataset sotto molteplici ipotesi di etichettatura. Il materiale di ricerca presentato, ruota principalmente attorno a due algoritmi. Nella prima parte, l'attenzione è rivolta a Graph Transduction Games (GTG), un algoritmo di label propagation basato su nozioni di Teoria dei Giochi. In particolare, questo documento descrive le interazioni sperimentate con GTG e dei deep feature extractor, per affrontare problemi di semi-supervised, domain adaptation e deep metric learning. La seconda parte è incentrata su Relaxation Labeling (ReLab), una famiglia di processi utilizzata per label disambiguation, fortemente connessa a GTG, sebbene sia motivata da un differente contesto teorico. Questo documento alcuni concetti preliminari di teoria e degli esperimenti pensati per investigare future applicazioni di ReLab nel contesto di semi-supervised semantic segmentation. Il lavoro presentato di seguito può essere pensato come un punto iniziale per costituire una teoria di contextual weakly supervised learning.
APA, Harvard, Vancouver, ISO, and other styles
23

Brunetti, Enrico. "Sperimentazione di Deep Metric Loss per Self-Supervised Information Retrieval Systems su CORD19." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24295/.

Full text
Abstract:
Dopo lo sviluppo dei primi casi di Covid-19 in Cina nell’autunno del 2019, ad inizio 2020 l’intero pianeta è precipitato in una pandemia globale che ha stravolto le nostre vite con conseguenze che non si vivevano dall’influenza spagnola. La grandissima quantità di paper scientifici in continua pubblicazione sul coronavirus e virus ad esso affini ha portato alla creazione di un unico dataset dinamico chiamato CORD19 e distribuito gratuitamente. Poter reperire informazioni utili in questa mole di dati ha ulteriormente acceso i riflettori sugli information retrieval systems, capaci di recuperare in maniera rapida ed efficace informazioni preziose rispetto a una domanda dell'utente detta query. Di particolare rilievo è stata la TREC-COVID Challenge, competizione per lo sviluppo di un sistema di IR addestrato e testato sul dataset CORD19. Il problema principale è dato dal fatto che la grande mole di documenti è totalmente non etichettata e risulta dunque impossibile addestrare modelli di reti neurali direttamente su di essi. Per aggirare il problema abbiamo messo a punto nuove soluzioni self-supervised, a cui abbiamo applicato lo stato dell'arte del deep metric learning e dell'NLP. Il deep metric learning, che sta avendo un enorme successo soprattuto nella computer vision, addestra il modello ad "avvicinare" tra loro immagini simili e "allontanare" immagini differenti. Dato che sia le immagini che il testo vengono rappresentati attraverso vettori di numeri reali (embeddings) si possano utilizzare le stesse tecniche per "avvicinare" tra loro elementi testuali pertinenti (e.g. una query e un paragrafo) e "allontanare" elementi non pertinenti. Abbiamo dunque addestrato un modello SciBERT con varie loss, che ad oggi rappresentano lo stato dell'arte del deep metric learning, in maniera completamente self-supervised direttamente e unicamente sul dataset CORD19, valutandolo poi sul set formale TREC-COVID attraverso un sistema di IR e ottenendo risultati interessanti.
APA, Harvard, Vancouver, ISO, and other styles
24

REPETTO, MARCO. "Black-box supervised learning and empirical assessment: new perspectives in credit risk modeling." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2023. https://hdl.handle.net/10281/402366.

Full text
Abstract:
I recenti algoritmi di apprendimento automatico ad alte prestazioni sono convincenti ma opachi, quindi spesso è difficile capire come arrivano alle loro previsioni, dando origine a problemi di interpretabilità. Questi problemi sono particolarmente rilevanti nell'apprendimento supervisionato, dove questi modelli "black-box" non sono facilmente comprensibili per le parti interessate. Un numero crescente di lavori si concentra sul rendere più interpretabili i modelli di apprendimento automatico, in particolare quelli di apprendimento profondo. Gli approcci attualmente proposti si basano su un'interpretazione post-hoc, utilizzando metodi come la mappatura della salienza e le dipendenze parziali. Nonostante i progressi compiuti, l'interpretabilità è ancora un'area di ricerca attiva e non esiste una soluzione definitiva. Inoltre, nei processi decisionali ad alto rischio, l'interpretabilità post-hoc può essere subottimale. Un esempio è il campo della modellazione del rischio di credito aziendale. In questi campi, i modelli di classificazione discriminano tra buoni e cattivi mutuatari. Di conseguenza, gli istituti di credito possono utilizzare questi modelli per negare le richieste di prestito. Il rifiuto di un prestito può essere particolarmente dannoso quando il mutuatario non può appellarsi o avere una spiegazione e una motivazione della decisione. In questi casi, quindi, è fondamentale capire perché questi modelli producono un determinato risultato e orientare il processo di apprendimento verso previsioni basate sui fondamentali. Questa tesi si concentra sul concetto di Interpretable Machine Learning, con particolare attenzione al contesto della modellazione del rischio di credito. In particolare, la tesi ruota attorno a tre argomenti: l'interpretabilità agnostica del modello, l'interpretazione post-hoc nel rischio di credito e l'apprendimento guidato dall'interpretabilità. Più specificamente, il primo capitolo è un'introduzione guidata alle tecniche model-agnostic che caratterizzano l'attuale panorama del Machine Learning e alle loro implementazioni. Il secondo capitolo si concentra su un'analisi empirica del rischio di credito delle piccole e medie imprese italiane. Propone una pipeline analitica in cui l'interpretabilità post-hoc gioca un ruolo cruciale nel trovare le basi rilevanti che portano un'impresa al fallimento. Il terzo e ultimo articolo propone una nuova metodologia di iniezione di conoscenza multicriteriale. La metodologia si basa sulla doppia retropropagazione e può migliorare le prestazioni del modello, soprattutto in caso di scarsità di dati. Il vantaggio essenziale di questa metodologia è che permette al decisore di imporre le sue conoscenze pregresse all'inizio del processo di apprendimento, facendo previsioni che si allineano con i fondamentali.
Recent highly performant Machine Learning algorithms are compelling but opaque, so it is often hard to understand how they arrive at their predictions giving rise to interpretability issues. Such issues are particularly relevant in supervised learning, where such black-box models are not easily understandable by the stakeholders involved. A growing body of work focuses on making Machine Learning, particularly Deep Learning models, more interpretable. The currently proposed approaches rely on post-hoc interpretation, using methods such as saliency mapping and partial dependencies. Despite the advances that have been made, interpretability is still an active area of research, and there is no silver bullet solution. Moreover, in high-stakes decision-making, post-hoc interpretability may be sub-optimal. An example is the field of enterprise credit risk modeling. In such fields, classification models discriminate between good and bad borrowers. As a result, lenders can use these models to deny loan requests. Loan denial can be especially harmful when the borrower cannot appeal or have the decision explained and grounded by fundamentals. Therefore in such cases, it is crucial to understand why these models produce a given output and steer the learning process toward predictions based on fundamentals. This dissertation focuses on the concept of Interpretable Machine Learning, with particular attention to the context of credit risk modeling. In particular, the dissertation revolves around three topics: model agnostic interpretability, post-hoc interpretation in credit risk, and interpretability-driven learning. More specifically, the first chapter is a guided introduction to the model-agnostic techniques shaping today’s landscape of Machine Learning and their implementations. The second chapter focuses on an empirical analysis of the credit risk of Italian Small and Medium Enterprises. It proposes an analytical pipeline in which post-hoc interpretability plays a crucial role in finding the relevant underpinnings that drive a firm into bankruptcy. The third and last paper proposes a novel multicriteria knowledge injection methodology. The methodology is based on double backpropagation and can improve model performance, especially in the case of scarce data. The essential advantage of such methodology is that it allows the decision maker to impose his previous knowledge at the beginning of the learning process, making predictions that align with the fundamentals.
APA, Harvard, Vancouver, ISO, and other styles
25

Ramesh, Rohit. "Abnormality detection with deep learning." Thesis, Queensland University of Technology, 2018. https://eprints.qut.edu.au/118542/1/Rohit_Ramesh_Thesis.pdf.

Full text
Abstract:
This thesis is a step forward in developing the scientific basis for abnormality detection of individuals in crowded environments by utilizing a deep learning method. Such applications for monitoring human behavior in crowds is useful for public safety and security purposes.
APA, Harvard, Vancouver, ISO, and other styles
26

Kondamari, Pramod Sai, and Anudeep Itha. "A Deep Learning Application for Traffic Sign Recognition." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21890.

Full text
Abstract:
Background: Traffic Sign Recognition (TSR) is particularly useful for novice driversand self-driving cars. Driver Assistance Systems(DAS) involves automatic trafficsign recognition. Efficient classification of the traffic signs is required in DAS andunmanned vehicles for safe navigation. Convolutional Neural Networks(CNN) isknown for establishing promising results in the field of image classification, whichinspired us to employ this technique in our thesis. Computer vision is a process thatis used to understand the images and retrieve data from them. OpenCV is a Pythonlibrary used to detect traffic sign images in real-time. Objectives: This study deals with an experiment to build a CNN model which canclassify the traffic signs in real-time effectively using OpenCV. The model is builtwith low computational cost. The study also includes an experiment where variouscombinations of parameters are tuned to improve the model’s performance. Methods: The experimentation method involve building a CNN model based onmodified LeNet architecture with four convolutional layers, two max-pooling layersand two dense layers. The model is trained and tested with the German Traffic SignRecognition Benchmark (GTSRB) dataset. Parameter tuning with different combinationsof learning rate and epochs is done to improve the model’s performance.Later this model is used to classify the images introduced to the camera in real-time. Results: The graphs depicting the accuracy and loss of the model before and afterparameter tuning are presented. An experiment is done to classify the traffic signimage introduced to the camera by using the CNN model. High probability scoresare achieved during the process which is presented. Conclusions: The results show that the proposed model achieved 95% model accuracywith an optimum number of epochs, i.e., 30 and default optimum value oflearning rate, i.e., 0.001. High probabilities, i.e., above 75%, were achieved when themodel was tested using new real-time data.
APA, Harvard, Vancouver, ISO, and other styles
27

Baier, Stephan [Verfasser], and Volker [Akademischer Betreuer] Tresp. "Learning representations for supervised information fusion using tensor decompositions and deep learning methods / Stephan Baier ; Betreuer: Volker Tresp." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2019. http://d-nb.info/1185979220/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Novello, Paul. "Combining supervised deep learning and scientific computing : some contributions and application to computational fluid dynamics." Thesis, Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAX005.

Full text
Abstract:
Cette thèse s’inscrit dans le domaine émergent de l’apprentissage automatique scientifique, qui étudie l’application de l’apprentissage automatique au calcul scientifique. Plus précisément, nous nous intéressons à l’utilisation de l’apprentissage profond pour accélérer des simulations numériques.Pour atteindre cet objectif, nous nous concentrons sur l’approximation de certaines parties des logiciels de simulation basés sur des Equations Différentielles Partielles (EDP) par un réseau de neurones. La méthodologie proposée s'appuie sur la construction d’un ensemble de données, la sélection et l'entraînement d’un réseau de neurones et son intégration dans le logiciel original, donnant lieu à une simulation numérique hybride. Malgré la simplicité apparente de cette approche, le contexte des simulations numériques implique des difficultés spécifiques. Puisque nous visons à accélérer des simulations, le premier enjeu est de trouver un compromis entre la précision des réseaux de neurones et leur temps d’exécution. En effet, l’amélioration de la première implique souvent la dégradation du second. L’absence de garantie mathématique sur le contrôle de la précision numérique souhaitée inhérent à la conception du réseau de neurones par apprentissage statistique constitue le second enjeu. Ainsi nous souhaiterions maitriser la fiabilité des prédictions issues de notre logiciel de simulation hybride. Afin de satisfaire ces enjeux, nous étudions en détail chaque étape de la méthodologie d’apprentissage profond. Ce faisant, nous mettons en évidence certaines similitudes entre l'apprentissage automatique et la simulation numérique, nous permettant de présenter des contributions ayant un impact sur chacun de ces domaines.Nous identifions les principales étapes de la méthodologie d’apprentissage profond comme étant la constitution d’un ensemble de données d’entraînement, le choix des hyperparamètres d’un réseau de neurones et son entraînement. Pour la première étape, nous tirons parti de la possibilité d’échantillonner les données d’entraînement à l'aide du logiciel de simulation initial pour caractériser une distribution d’entraînement plus efficace basée sur la variation locale de la fonction à approcher. Nous généralisons cette observation pour permettre son application à des problèmes variés d’apprentissage automatique en construisant une méthodologie de pondération des données appelée ”Variance Based Sample Weighting”. Dans un deuxième temps, nous proposons l’usage de l’analyse de sensibilité, une approche largement utilisée en calcul scientifique, pour l’optimisation des hyperparamètres des réseaux de neurones. Cette approche repose sur l’évaluation qualitative de l’effet des hyperparamètres sur les performances d’un réseau de neurones à l'aide du critère d'indépendance de Hilbert-Schmidt. Les adaptations au contexte de l’optimisation des hyperparamètres conduisent à une méthodologie interprétable permettant de construire des réseaux de neurones à la fois performants et précis. Pour la troisième étape, nous définissons formellement une analogie entre la résolution stochastique d’EDPs et le processus d’optimisation en jeu lors de l'entrainement d’un réseau de neurones. Cette analogie permet d’obtenir un cadre pour l’entraînement des réseaux de neurones basé sur la théorie des EDPs, qui ouvre de nombreuses possibilités d’améliorations pour les algorithmes d’optimisation existants. Enfin, nous appliquons ces méthodologies à une simulation numérique de dynamique des fluides couplée à un code d’équilibre chimique multi-espèces. Celles-ci nous permettent d’atteindre une accélération d’un facteur 21 avec une dégradation de la précision contrôlée ou nulle par rapport à la p rédiction initiale
Recent innovations in mathematics, computer science, and engineering have enabled more and more sophisticated numerical simulations. However, some simulations remain computationally unaffordable, even for the most powerful supercomputers. Lately, machine learning has proven its ability to improve the state-of-the-art in many fields, notoriously computer vision, language understanding, or robotics. This thesis settles in the high-stakes emerging field of Scientific Machine Learning which studies the application of machine learning to scientific computing. More specifically, we consider the use of deep learning to accelerate numerical simulations.We focus on approximating some components of Partial Differential Equation (PDE) based simulation software by a neural network. This idea boils down to constructing a data set, selecting and training a neural network, and embedding it into the original code, resulting in a hybrid numerical simulation. Although this approach may seem trivial at first glance, the context of numerical simulations comes with several challenges. Since we aim at accelerating codes, the first challenge is to find a trade-off between neural networks’ accuracy and execution time. The second challenge stems from the data-driven process of the training, and more specifically, its lack of mathematical guarantees. Hence, we have to ensure that the hybrid simulation software still yields reliable predictions. To tackle these challenges, we thoroughly study each step of the deep learning methodology while considering the aforementioned constraints. By doing so, we emphasize interplays between numerical simulations and machine learning that can benefit each of these fields.We identify the main steps of the deep learning methodology as the construction of the training data set, the choice of the hyperparameters of the neural network, and its training. For the first step, we leverage the ability to sample training data with the original software to characterize a more efficient training distribution based on the local variation of the function to approximate. We generalize this approach to general machine learning problems by deriving a data weighting methodology called Variance Based Sample Weighting. For the second step, we introduce the use of sensitivity analysis, an approach widely used in scientific computing, to tackle neural network hyperparameter optimization. This approach is based on qualitatively assessing the effect of hyperparameters on the performances of a neural network using Hilbert-Schmidt Independence Criterion. We adapt it to the hyperparameter optimization context and build an interpretable methodology that yields competitive and cost-effective networks. For the third step, we formally define an analogy between the stochastic resolution of PDEs and the optimization process at play when training a neural network. This analogy leads to a PDE-based framework for training neural networks that opens up many possibilities for improving existing optimization algorithms. Finally, we apply these contributions to a computational fluid dynamics simulation coupled with a multi-species chemical equilibrium code. We demonstrate that we can achieve a time factor acceleration of 21 with controlled to no degradation from the initial prediction
APA, Harvard, Vancouver, ISO, and other styles
29

Banville, Hubert. "Enabling real-world EEG applications with deep learning." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG005.

Full text
Abstract:
Au cours des dernières décennies, les avancées révolutionnaires en neuroimagerie ont permis de considérablement améliorer notre compréhension du cerveau. Aujourd'hui, avec la disponibilité croissante des dispositifs personnels de neuroimagerie portables, tels que l'EEG mobile " à bas prix ", une nouvelle ère s’annonce où cette technologie n'est plus limitée aux laboratoires de recherche ou aux contextes cliniques. Les applications de l’EEG dans le " monde réel " présentent cependant leur lot de défis, de la rareté des données étiquetées à la qualité imprévisible des signaux et leur résolution spatiale limitée. Dans cette thèse, nous nous appuyons sur le domaine de l'apprentissage profond afin de transformer cette modalité d'imagerie cérébrale centenaire, purement clinique et axée sur la recherche, en une technologie pratique qui peut bénéficier à l'individu au quotidien. Tout d'abord, nous étudions comment les données d’EEG non étiquetées peuvent être mises à profit via l'apprentissage auto-supervisé pour améliorer la performance d’algorithmes d'apprentissage entraînés sur des tâches cliniques courantes. Nous présentons trois approches auto-supervisées qui s'appuient sur la structure temporelle des données elles-mêmes, plutôt que sur des étiquettes souvent difficiles à obtenir, pour apprendre des représentations pertinentes aux tâches cliniques étudiées. Par le biais d'expériences sur des ensembles de données à grande échelle d'enregistrements de sommeil et d’examens neurologiques, nous démontrons l'importance des représentations apprises, et révélons comment les données non étiquetées peuvent améliorer la performance d’algorithmes dans un scénario semi-supervisé. Ensuite, nous explorons des techniques pouvant assurer la robustesse des réseaux de neurones aux fortes sources de bruit souvent présentes dans l’EEG hors laboratoire. Nous présentons le Filtrage Spatial Dynamique, un mécanisme attentionnel qui permet à un réseau de dynamiquement concentrer son traitement sur les canaux EEG les plus instructifs tout en minimisant l’apport des canaux corrompus. Des expériences sur des ensembles de données à grande échelle, ainsi que des données du monde réel démontrent qu'avec l'EEG à peu de canaux, notre module attentionnel gère mieux la corruption qu'une approche automatisée de traitement du bruit, et que les cartes d'attention prédites reflètent le fonctionnement du réseau de neurones. Enfin, nous explorons l'utilisation d'étiquettes faibles afin de développer un biomarqueur de la santé neurophysiologique à partir d'EEG collecté dans le monde réel. Pour ce faire, nous transposons à ces données d'EEG le principe d'âge cérébral, originellement développé avec l'imagerie par résonance magnétique en laboratoire et en clinique. À travers l'EEG de plus d'un millier d'individus enregistré pendant un exercice d'attention focalisée ou le sommeil nocturne, nous démontrons non seulement que l'âge peut être prédit à partir de l'EEG portable, mais aussi que ces prédictions encodent des informations contenues dans des biomarqueurs de santé cérébrale, mais absentes dans l'âge chronologique. Dans l’ensemble, cette thèse franchit un pas de plus vers l’utilisation de l’EEG pour le suivi neurophysiologique en dehors des contextes de recherche et cliniques traditionnels, et ouvre la porte à de nouvelles applications plus flexibles de cette technologie
Our understanding of the brain has improved considerably in the last decades, thanks to groundbreaking advances in the field of neuroimaging. Now, with the invention and wider availability of personal wearable neuroimaging devices, such as low-cost mobile EEG, we have entered an era in which neuroimaging is no longer constrained to traditional research labs or clinics. "Real-world'' EEG comes with its own set of challenges, though, ranging from a scarcity of labelled data to unpredictable signal quality and limited spatial resolution. In this thesis, we draw on the field of deep learning to help transform this century-old brain imaging modality from a purely clinical- and research-focused tool, to a practical technology that can benefit individuals in their day-to-day life. First, we study how unlabelled EEG data can be utilized to gain insights and improve performance on common clinical learning tasks using self-supervised learning. We present three such self-supervised approaches that rely on the temporal structure of the data itself, rather than onerously collected labels, to learn clinically-relevant representations. Through experiments on large-scale datasets of sleep and neurological screening recordings, we demonstrate the significance of the learned representations, and show how unlabelled data can help boost performance in a semi-supervised scenario. Next, we explore ways to ensure neural networks are robust to the strong sources of noise often found in out-of-the-lab EEG recordings. Specifically, we present Dynamic Spatial Filtering, an attention mechanism module that allows a network to dynamically focus its processing on the most informative EEG channels while de-emphasizing any corrupted ones. Experiments on large-scale datasets and real-world data demonstrate that, on sparse EEG, the proposed attention block handles strong corruption better than an automated noise handling approach, and that the predicted attention maps can be interpreted to inspect the functioning of the neural network. Finally, we investigate how weak labels can be used to develop a biomarker of neurophysiological health from real-world EEG. We translate the brain age framework, originally developed using lab and clinic-based magnetic resonance imaging, to real-world EEG data. Using recordings from more than a thousand individuals performing a focused attention exercise or sleeping overnight, we show not only that age can be predicted from wearable EEG, but also that age predictions encode information contained in well-known brain health biomarkers, but not in chronological age. Overall, this thesis brings us a step closer to harnessing EEG for neurophysiological monitoring outside of traditional research and clinical contexts, and opens the door to new and more flexible applications of this technology
APA, Harvard, Vancouver, ISO, and other styles
30

GONZALEZ, JONAS PIERRE GUSTAVO. "Self-supervised solutions for developmental learning with the humanoid robot iCub." Doctoral thesis, Università degli studi di Genova, 2021. http://hdl.handle.net/11567/1047609.

Full text
Abstract:
For a long time, robots were assigned to repetitive tasks such as industrial chains, where social skills played a secondary role. However, as next-generation robots are designed to interact and collaborate with humans, it becomes ever more important to endow them with social competencies. Humans use several explicit and implicit cues, like gaze, facial expression, and gestures to communicate. To understand and acquire these skills, social interaction during infancy plays a crucial role. Babies already at birth show social skills and continue to learn and develop them during all childhood. As robots will interact more frequently with us, with the goal of becoming our social companions helping us in different tasks, they should also learn these abilities. Deep learning algorithms have reached state-of-the-art results in different tasks, from objects recognition/detection to speech recognition. They demonstrated to be powerful tools to address complex problems, being valid candidates to make robots learn social skills. However, most of the achievements were reached using a supervised-learning approach, where large annotated datasets are available. This requires human supervision in both collecting and annotating the data, which in robotic applications can be problematic. Indeed, those networks, when applied in robots, can suffer from a drop in performance and need to be fine-tuned. Consequently, the annotation process has to be repeated to deal with the inherent dynamicity of robots, which is time-consuming and limits the autonomy of robots in their learning. Robots, being embodied, have access to a continuous stream of data thanks to their different sensors. Thus, instead of relying on human supervision to annotate their data, they should learn in a self-supervised way, similarly to babies in their early development. Advancements in neuroscience and psychology give some insights into how babies learn and develop social skills. For example, attentional mechanisms and multi-modal experience play an important role to guide the learning in the earliest years of life. Moreover, the development of their social skills follows specific stages. For example, very early they learn to detect faces as it is an important vector of information to further develop joint-attention mechanisms or emotion recognition. The works presented in this thesis investigate how to integrate Deep Learning and human early developmental strategies to enable robots to learn autonomously from their sensory experience. More specifically, we designed computational architectures tested on the iCub humanoid robot in ecological and natural interaction experiments where participants behaviours were not fully controlled. Indeed, in the past several studies have investigated self-supervised learning approaches to develop autonomous robots, but usually, the interaction was strongly constrained, often making it not representative of real-world scenarios. The novelty of this thesis relied also on the integration of the facilitation mechanisms used by babies, such as attention and cross-modal learning with Deep learning, to propose self-supervised frameworks. We focused on perception abilities that are important to develop social skills, such as face detection, voice localisation or people identification, to test our framework. Our results demonstrated the effectiveness of the approach: using our proposed framework iCubcould collect different datasets without the need for human annotations. Those datasets were used to train Deep Learning networks for faces, objects detection, sound localisation and person recognition to make the robot generalize from its experience. While the performances do not compare to state-of-the-art networks, these are promising results because they represent a proof of concept of the feasibility of the adoption of developmentally inspired mechanisms, to guide the learning in a pro-active way in robots. The different architectures proposed in this thesis represent a novel contribution to the development of robots capable of autonomously and efficiently learn from their sensory experience in ecological interactions. To conclude, our approach is a step forward toward autonomous robots that can learn directly from their experience in a self-supervised way.
APA, Harvard, Vancouver, ISO, and other styles
31

Chen, Zhiang. "Deep-learning Approaches to Object Recognition from 3D Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=case1496303868914492.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Yu, Lu. "Semantic representation: from color to deep embeddings." Doctoral thesis, Universitat Autònoma de Barcelona, 2019. http://hdl.handle.net/10803/669458.

Full text
Abstract:
Un dels problemes fonamentals de la visió per computador és representar imatges amb descripcions compactes semànticament rellevants. Aquestes descripcions podrien utilitzar-se en una àmplia varietat d'aplicacions, com la comparació d'imatges, la detecció d'objectes i la cerca de vídeos. L'objectiu principal d'aquesta tesi és estudiar les representacions d'imatges des de dos aspectes: les descripcions de color i les descripcions profundes amb xarxes neuronals. A la primera part de la tesi partim de descripcions de color modelades a mà. Existeixen noms comuns en diverses llengües per als colors bàsics, i proposem un mètode per estendre els noms de colors addicionals d'acord amb la seva naturalesa complementària als bàsics. Això ens permet calcular representacions de noms de colors de longitud arbitrària amb un alt poder discriminatori. Els experiments psicofísics confirmen que el mètode proposat supera els marcs de referència existents. En segon lloc, en agregar estratègies d'atenció, aprenem descripcions de colors profundes amb xarxes neuronals a partir de dades amb anotacions per a la imatge, en comptes de per a cada un dels píxels. L'estratègia d'atenció aconsegueix identificar correctament les regions rellevants per a cada classe que volem avaluar. L'avantatge de l'enfocament proposat és que els noms de colors a utilitzar es poden aprendre específicament per a dominis dels que no existeixen anotacions a nivell de píxel. A la segona part de la tesi, ens centrem en les descripcions profundes amb xarxes neuronals. En primer lloc, abordem el problema de comprimir grans xarxes de descriptors en xarxes més petites, mantenint un rendiment similar. Proposem destil·lar les mètriques d'una xarxa mestre a una xarxa estudiant. S'introdueixen dues noves funcions de cost per a modelar la comunicació de la xarxa mestre a una xarxa estudiant més petita: una basada en un mestre absolut, on l'estudiant pretén produir els mateixos descriptors que el mestre, i una altra basada en un mestre relatiu, on les distàncies entre parells de punts de dades són comunicades del mestre a l'alumne. A més, s'han investigat diversos aspectes de la destil·lació per a les representacions, incloses les capes d'atenció, l'aprenentatge semi-supervisat i la destil·lació de qualitat creuada. Finalment, s'estudia un altre aspecte de l'aprenentatge per mètrica profund, l'aprenentatge continuat. Observem que es produeix una variació del coneixement après durant l'entrenament de noves tasques. En aquesta tesi es presenta un mètode per estimar la variació semàntica en funció de la variació que experimenten les dades de la tasca actual durant el seu aprenentatge. Tenint en compte aquesta estimació, les tasques anteriors poden ser compensades, millorant així el seu rendiment. A més, mostrem que les xarxes de descripcions profundes pateixen significativament menys oblits catastròfics en comparació amb les xarxes de classificació quan aprenen noves tasques.
Uno de los problemas fundamentales de la visión por computador es representar imágenes con descripciones compactas semánticamente relevantes. Estas descripciones podrían utilizarse en una amplia variedad de aplicaciones, como la comparación de imágenes, la detección de objetos y la búsqueda de vídeos. El objetivo principal de esta tesis es estudiar las representaciones de imágenes desde dos aspectos: las descripciones de color y las descripciones profundas con redes neuronales. En la primera parte de la tesis partimos de descripciones de color modeladas a mano. Existen nombres comunes en varias lenguas para los colores básicos, y proponemos un método para extender los nombres de colores adicionales de acuerdo con su naturaleza complementaria a los básicos. Esto nos permite calcular representaciones de nombres de colores de longitud arbitraria con un alto poder discriminatorio. Los experimentos psicofísicos confirman que el método propuesto supera a los marcos de referencia existentes. En segundo lugar, al agregar estrategias de atención, aprendemos descripciones de colores profundos con redes neuronales a partir de datos con anotaciones para la imagen en vez de para cada uno de los píxeles. La estrategia de atención logra identificar correctamente las regiones relevantes para cada clase que queremos evaluar. La ventaja del enfoque propuesto es que los nombres de colores a usar se pueden aprender específicamente para dominios de los que no existen anotaciones a nivel de píxel. En la segunda parte de la tesis, nos centramos en las descripciones profundas con redes neuronales. En primer lugar, abordamos el problema de comprimir grandes redes de descriptores en redes más pequeñas, manteniendo un rendimiento similar. Proponemos destilar las métricas de una red maestro a una red estudiante. Se introducen dos nuevas funciones de coste para modelar la comunicación de la red maestro a una red estudiante más pequeña: una basada en un maestro absoluto, donde el estudiante pretende producir los mismos descriptores que el maestro, y otra basada en un maestro relativo, donde las distancias entre pares de puntos de datos son comunicadas del maestro al alumno. Además, se han investigado diversos aspectos de la destilación para las representaciones, incluidas las capas de atención, el aprendizaje semi-supervisado y la destilación de calidad cruzada. Finalmente, se estudia otro aspecto del aprendizaje por métrica profundo, el aprendizaje continuado. Observamos que se produce una variación del conocimiento aprendido durante el entrenamiento de nuevas tareas. En esta tesis se presenta un método para estimar la variación semántica en función de la variación que experimentan los datos de la tarea actual durante su aprendizaje. Teniendo en cuenta esta estimación, las tareas anteriores pueden ser compensadas, mejorando así su rendimiento. Además, mostramos que las redes de descripciones profundas sufren significativamente menos olvidos catastróficos en comparación con las redes de clasificación cuando aprenden nuevas tareas.
One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings. In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists. In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks.
APA, Harvard, Vancouver, ISO, and other styles
33

Chen, Mickaël. "Learning with weak supervision using deep generative networks." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS024.

Full text
Abstract:
Nombre des succès de l’apprentissage profond reposent sur la disponibilité de données massivement collectées et annotées, exploités par des algorithmes supervisés. Ces annotations, cependant, peuvent s’avérer difficiles à obtenir. La conception de méthodes peu gourmandes en annotations est ainsi un enjeu important, abordé dans des approches semi-supervisées ou faiblement supervisées. Par ailleurs ont été récemment introduit les réseaux génératifs profonds, capable de manipuler des distributions complexes et à l’origine d’avancées majeures, en édition d’image et en adaptation de domaine par exemple. Dans cette thèse, nous explorons comment ces outils nouveaux peuvent être exploités pour réduire les besoins en annotations. En premier lieu, nous abordons la tâche de prédiction stochastique. Il s’agit de concevoir des systèmes de prédiction structurée tenant compte de la diversité des réponses possibles. Nous proposons dans ce cadre deux modèles, le premier pour des données multi-vues avec vues manquantes, et le second pour la prédiction de futurs possibles d'une séquence vidéo. Ensuite, nous étudions la décomposition en deux facteurs latents indépendants dans le cas où un seul facteur est annoté. Nous proposons des modèles qui visent à retrouver des représentations latentes sémantiquement cohérentes de ces facteurs explicatifs. Le premier modèle est appliqué en génération de données de capture de mouvements, le second, sur des données multi-vues. Enfin, nous nous attaquons au problème, crucial en vision par ordinateur, de la segmentation d’image. Nous proposons un modèle, inspiré des idées développées dans cette thèse, de segmentation d’objet entièrement non supervisé
Many successes of deep learning rely on the availability of massive annotated datasets that can be exploited by supervised algorithms. Obtaining those labels at a large scale, however, can be difficult, or even impossible in many situations. Designing methods that are less dependent on annotations is therefore a major research topic, and many semi-supervised and weakly supervised methods have been proposed. Meanwhile, the recent introduction of deep generative networks provided deep learning methods with the ability to manipulate complex distributions, allowing for breakthroughs in tasks such as image edition and domain adaptation. In this thesis, we explore how these new tools can be useful to further alleviate the need for annotations. Firstly, we tackle the task of performing stochastic predictions. It consists in designing systems for structured prediction that take into account the variability in possible outputs. We propose, in this context, two models. The first one performs predictions on multi-view data with missing views, and the second one predicts possible futures of a video sequence. Then, we study adversarial methods to learn a factorized latent space, in a setting with two explanatory factors but only one of them is annotated. We propose models that aim to uncover semantically consistent latent representations for those factors. One model is applied to the conditional generation of motion capture data, and another one to multi-view data. Finally, we focus on the task of image segmentation, which is of crucial importance in computer vision. Building on previously explored ideas, we propose a model for object segmentation that is entirely unsupervised
APA, Harvard, Vancouver, ISO, and other styles
34

Hellström, Erik. "Feature learning with deep neural networks for keystroke biometrics : A study of supervised pre-training and autoencoders." Thesis, Luleå tekniska universitet, Datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-67206.

Full text
Abstract:
Computer security is becoming an increasingly important topic in today’s society, withever increasing connectivity between devices and services. Stolen passwords have thepotential to cause severe damage to companies and individuals alike, leading to therequirement that the security system must be able to detect and prevent fraudulentlogin. Keystroke biometrics is the study of the typing behavior in order to identifythe typist, using features extracted during typing. The features traditionally used inkeystroke biometrics are linear combinations of the timestamps of the keystrokes.This work focuses on feature learning methods and is based on the Carnegie Mellonkeystroke data set. The aim is to investigate if other feature extraction methods canenable improved classification of users. Two methods are employed to extract latentfeatures in the data: Pre-training of an artificial neural network classifier and an autoencoder. Several tests are devised to test the impact of pre-training and compare theresults of a similar network without pre-training. The effect of feature extraction withan autoencoder on a classifier trained on the autoencoder features in combination withthe conventional features is investigated.Using pre-training, I find that the classification accuracy does not improve when using an adaptive learning rate optimizer. However, when a stochastic gradient descentoptimizer is used the accuracy improves by about 8%. Used in conjunction with theconventional features, the features extracted with an autoencoder improve the accuracyof the classifier with about 2%. However, a classifier based on the autoencoder featuresalone is not better than a classifier based on conventional features.
APA, Harvard, Vancouver, ISO, and other styles
35

Rossi, Alex. "Self-supervised information retrieval: a novel approach based on Deep Metric Learning and Neural Language Models." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
Abstract:
Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.
APA, Harvard, Vancouver, ISO, and other styles
36

Granström, Daria, and Johan Abrahamsson. "Loan Default Prediction using Supervised Machine Learning Algorithms." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252312.

Full text
Abstract:
It is essential for a bank to estimate the credit risk it carries and the magnitude of exposure it has in case of non-performing customers. Estimation of this kind of risk has been done by statistical methods through decades and with respect to recent development in the field of machine learning, there has been an interest in investigating if machine learning techniques can perform better quantification of the risk. The aim of this thesis is to examine which method from a chosen set of machine learning techniques exhibits the best performance in default prediction with regards to chosen model evaluation parameters. The investigated techniques were Logistic Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificial Neural Network and Support Vector Machine. An oversampling technique called SMOTE was implemented in order to treat the imbalance between classes for the response variable. The results showed that XGBoost without implementation of SMOTE obtained the best result with respect to the chosen model evaluation metric.
Det är nödvändigt för en bank att ha en bra uppskattning på hur stor risk den bär med avseende på kunders fallissemang. Olika statistiska metoder har använts för att estimera denna risk, men med den nuvarande utvecklingen inom maskininlärningsområdet har det väckt ett intesse att utforska om maskininlärningsmetoder kan förbättra kvaliteten på riskuppskattningen. Syftet med denna avhandling är att undersöka vilken metod av de implementerade maskininlärningsmetoderna presterar bäst för modellering av fallissemangprediktion med avseende på valda modelvaldieringsparametrar. De implementerade metoderna var Logistisk Regression, Random Forest, Decision Tree, AdaBoost, XGBoost, Artificiella neurala nätverk och Stödvektormaskin. En översamplingsteknik, SMOTE, användes för att behandla obalansen i klassfördelningen för svarsvariabeln. Resultatet blev följande: XGBoost utan implementering av SMOTE visade bäst resultat med avseende på den valda metriken.
APA, Harvard, Vancouver, ISO, and other styles
37

Berlin, Daniel. "Multi-class Supervised Classification Techniques for High-dimensional Data: Applications to Vehicle Maintenance at Scania." Thesis, KTH, Matematisk statistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209257.

Full text
Abstract:
In vehicle repairs, many times locating the cause of error could turn out more time consuming than the reparation itself. Hence a systematic way to accurately predict a fault causing part would constitute a valuable tool especially for errors difficult to diagnose. This thesis explores the predictive ability of Diagnostic Trouble Codes (DTC’s), produced by the electronic system on Scania vehicles, as indicators for fault causing parts. The statistical analysis is based on about 18800 observations of vehicles where both DTC’s and replaced parts could be identified during the period march 2016 - march 2017. Two different approaches of forming classes is evaluated. Many classes had only few observations and, to give the classifiers a fair chance, it is decided to omit observations of classes based on their frequency in data. After processing, the resulting data could comprise 1547 observations on 4168 features, demonstrating very high dimensionality and making it impossible to apply standard methods of large-sample statistical inference. Two procedures of supervised statistical learning, that are able to cope with high dimensionality and multiple classes, Support Vector Machines and Neural Networks are exploited and evaluated. The analysis showed that on data with 1547 observations of 4168 features (unique DTC’s) and 7 classes SVM yielded an average prediction accuracy of 79.4% compared to 75.4% using NN.The conclusion of the analysis is that DTC’s holds potential to be used as indicators for fault causing parts in a predictive model, but in order to increase prediction accuracy learning data needs improvements. Scope for future research to improve and expand the model, along with practical suggestions for exploiting supervised classifiers at Scania is provided. keywords: Statistical learning, Machine learning, Neural networks, Deep learning, Supervised learning, High dimensionality
Många gånger i samband med fordonsreparationer är felsökningen mer tidskrävande än själva reparationen. Således skulle en systematisk metod för att noggrant prediktera felkällan vara ett värdefullt verktyg för att diagnostisera reparationsåtgärder. I denna uppsats undersöks möjligheten att använda Diagnostic Trouble Codes (DTC:er), som genereras av de elektroniska systemen i Scanias fordon, som indikatorer för att peka ut felorsaken. Till grund för analysen användes ca 18800 observationer av fordon där både DTC:er samt utbytta delar kunnat identifieras under perioden mars 2016 - mars 2017. Två olika strategier för att generera klasser har utvärderats. Till många av klasserna fanns det endast ett fåtal observationer, och för att ge de prediktiva modellerna bra förutsättningar så användes endast klasser med tillräckligt många observationer i träningsdata. Efter bearbetning kunde data innehålla 1547 observationer 4168 attribut, vilket demonstrerar problemets höga dimensionalitet och gör det omöjligt att applicera standard metoder för statistisk analys på stora datamängder. Två metoder för övervakad statistisk inlärning, lämpliga för högdimensionell data med multipla klasser, Södvectormaskiner (SVM) samt Neurala Nätverk (NN) implementeras och deras resultat utvärderas. Analysen visade att på data med 1547 observationer av 4168 attribut (unika DTC:er) och 7 klasser kunde SVM prediktera observationer till klasserna med 79.4% noggrannhet jämfört med 75.4% för NN. De slutsatser som kunde dras av analysen var att DTC:er tycks ha potential att användas för att indikera felorsaker med en prediktiv modell, men att den data som ligger till grund för analysen bör förbättras för att öka noggrannheten i de prediktiva modellerna. Framtida forskningsmöjligheter för att ytterligare förbättra samt utveckla modellen, tillsammans med förslag för hur övervakade klassificerings modeller kan användas på Scnaia har identifierats.
APA, Harvard, Vancouver, ISO, and other styles
38

Viebke, André. "Accelerated Deep Learning using Intel Xeon Phi." Thesis, Linnéuniversitetet, Institutionen för datavetenskap (DV), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-45491.

Full text
Abstract:
Deep learning, a sub-topic of machine learning inspired by biology, have achieved wide attention in the industry and research community recently. State-of-the-art applications in the area of computer vision and speech recognition (among others) are built using deep learning algorithms. In contrast to traditional algorithms, where the developer fully instructs the application what to do, deep learning algorithms instead learn from experience when performing a task. However, for the algorithm to learn require training, which is a high computational challenge. High Performance Computing can help ease the burden through parallelization, thereby reducing the training time; this is essential to fully utilize the algorithms in practice. Numerous work targeting GPUs have investigated ways to speed up the training, less attention have been paid to the Intel Xeon Phi coprocessor. In this thesis we present a parallelized implementation of a Convolutional Neural Network (CNN), a deep learning architecture, and our proposed parallelization scheme, CHAOS. Additionally a theoretical analysis and a performance model discuss the algorithm in detail and allow for predictions if even more threads are available in the future. The algorithm is evaluated on an Intel Xeon Phi 7120p, Xeon E5-2695v2 2.4 GHz and Core i5 661 3.33 GHz using various architectures and thread counts on the MNIST dataset. Findings show a 103.5x, 99.9x, 100.4x speed up for the large, medium, and small architecture respectively for 244 threads compared to 1 thread on the coprocessor. Moreover, a 10.9x - 14.1x (large to small) speed up compared to the sequential version running on Xeon E5. We managed to decrease training time from 7 days on the Core i5 and 31 hours on the Xeon E5, to 3 hours on the Intel Xeon Phi when training our large network for 15 epochs
APA, Harvard, Vancouver, ISO, and other styles
39

Chandra, Nagasai. "Node Classification on Relational Graphs using Deep-RGCNs." DigitalCommons@CalPoly, 2021. https://digitalcommons.calpoly.edu/theses/2265.

Full text
Abstract:
Knowledge Graphs are fascinating concepts in machine learning as they can hold usefully structured information in the form of entities and their relations. Despite the valuable applications of such graphs, most knowledge bases remain incomplete. This missing information harms downstream applications such as information retrieval and opens a window for research in statistical relational learning tasks such as node classification and link prediction. This work proposes a deep learning framework based on existing relational convolutional (R-GCN) layers to learn on highly multi-relational data characteristic of realistic knowledge graphs for node property classification tasks. We propose a deep and improved variant, Deep-RGCNs, with dense and residual skip connections between layers. These skip connections are known to be very successful with popular deep CNN-architectures such as ResNet and DenseNet. In our experiments, we investigate and compare the performance of Deep-RGCN with different baselines on multi-relational graph benchmark datasets, AIFB and MUTAG, and show how the deep architecture boosts the performance in the task of node property classification. We also study the training performance of Deep-RGCNs (with N layers) and discuss the gradient vanishing and over-smoothing problems common to deeper GCN architectures.
APA, Harvard, Vancouver, ISO, and other styles
40

Baradel, Fabien. "Structured deep learning for video analysis." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI045.

Full text
Abstract:
Avec l’augmentation massive du contenu vidéo sur Internet et au-delà, la compréhension automatique du contenu visuel pourrait avoir un impact sur de nombreux domaines d’application différents tels que la robotique, la santé, la recherche de contenu ou le filtrage. Le but de cette thèse est de fournir des contributions méthodologiques en vision par ordinateur et apprentissage statistique pour la compréhension automatique du contenu des vidéos. Nous mettons l’accent sur les problèmes de la reconnaissance de l’action humaine à grain fin et du raisonnement visuel à partir des interactions entre objets. Dans la première partie de ce manuscrit, nous abordons le problème de la reconnaissance fine de l’action humaine. Nous introduisons deux différents mécanismes d’attention, entrainés sur le contenu visuel à partir de la pose humaine articulée. Une première méthode est capable de porter automatiquement l’attention sur des points pré-sélectionnés importants de la vidéo, conditionnés sur des caractéristiques apprises extraites de la pose humaine articulée. Nous montrons qu’un tel mécanisme améliore les performances sur la tâche finale et fournit un bon moyen de visualiser les parties les plus discriminantes du contenu visuel. Une deuxième méthode va au-delà de la reconnaissance de l’action humaine basée sur la pose. Nous développons une méthode capable d’identifier automatiquement un nuage de points caractéristiques non structurés pour une vidéo à l’aide d’informations contextuelles. De plus, nous introduisons un système distribué entrainé pour agréger les caractéristiques de manière récurrente et prendre des décisions de manière distribuée. Nous démontrons que nous pouvons obtenir de meilleures performances que celles illustrées précédemment, sans utiliser d’informations de pose articulée au moment de l’inférence. Dans la deuxième partie de cette thèse, nous étudions les représentations vidéo d’un point de vue objet. Étant donné un ensemble de personnes et d’objets détectés dans la scène, nous développons une méthode qui a appris à déduire les interactions importantes des objets à travers l’espace et le temps en utilisant uniquement l’annotation au niveau vidéo. Cela permet d’identifier une interaction inter-objet importante pour une action donnée ainsi que le biais potentiel d’un ensemble de données. Enfin, dans une troisième partie, nous allons au-delà de la tâche de classification et d’apprentissage supervisé à partir de contenus visuels, en abordant la causalité à travers les interactions, et en particulier le problème de l’apprentissage contrefactuel. Nous introduisons une nouvelle base de données, à savoir CoPhy, où, après avoir regardé une vidéo, la tâche consiste à prédire le résultat après avoir modifié la phase initiale de la vidéo. Nous développons une méthode basée sur des interactions au niveau des objets capables d’inférer les propriétés des objets sans supervision ainsi que les emplacements futurs des objets après l’intervention
With the massive increase of video content on Internet and beyond, the automatic understanding of visual content could impact many different application fields such as robotics, health care, content search or filtering. The goal of this thesis is to provide methodological contributions in Computer Vision and Machine Learning for automatic content understanding from videos. We emphasis on problems, namely fine-grained human action recognition and visual reasoning from object-level interactions. In the first part of this manuscript, we tackle the problem of fine-grained human action recognition. We introduce two different trained attention mechanisms on the visual content from articulated human pose. The first method is able to automatically draw attention to important pre-selected points of the video conditioned on learned features extracted from the articulated human pose. We show that such mechanism improves performance on the final task and provides a good way to visualize the most discriminative parts of the visual content. The second method goes beyond pose-based human action recognition. We develop a method able to automatically identify unstructured feature clouds of interest in the video using contextual information. Furthermore, we introduce a learned distributed system for aggregating the features in a recurrent manner and taking decisions in a distributed way. We demonstrate that we can achieve a better performance than obtained previously, without using articulated pose information at test time. In the second part of this thesis, we investigate video representations from an object-level perspective. Given a set of detected persons and objects in the scene, we develop a method which learns to infer the important object interactions through space and time using the video-level annotation only. That allows to identify important objects and object interactions for a given action, as well as potential dataset bias. Finally, in a third part, we go beyond the task of classification and supervised learning from visual content by tackling causality in interactions, in particular the problem of counterfactual learning. We introduce a new benchmark, namely CoPhy, where, after watching a video, the task is to predict the outcome after modifying the initial stage of the video. We develop a method based on object- level interactions able to infer object properties without supervision as well as future object locations after the intervention
APA, Harvard, Vancouver, ISO, and other styles
41

Zhao, Yan. "Deep learning methods for reverberant and noisy speech enhancement." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1593462119759348.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Dhyani, Dushyanta Dhyani. "Boosting Supervised Neural Relation Extraction with Distant Supervision." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524095334803486.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Ma, Yufeng. "Going Deeper with Images and Natural Language." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/99993.

Full text
Abstract:
One aim in the area of artificial intelligence (AI) is to develop a smart agent with high intelligence that is able to perceive and understand the complex visual environment around us. More ambitiously, it should be able to interact with us about its surroundings in natural languages. Thanks to the progress made in deep learning, we've seen huge breakthroughs towards this goal over the last few years. The developments have been extremely rapid in visual recognition, in which machines now can categorize images into multiple classes, and detect various objects within an image, with an ability that is competitive with or even surpasses that of humans. Meanwhile, we also have witnessed similar strides in natural language processing (NLP). It is quite often for us to see that now computers are able to almost perfectly do text classification, machine translation, etc. However, despite much inspiring progress, most of the achievements made are still within one domain, not handling inter-domain situations. The interaction between the visual and textual areas is still quite limited, although there has been progress in image captioning, visual question answering, etc. In this dissertation, we design models and algorithms that enable us to build in-depth connections between images and natural languages, which help us to better understand their inner structures. In particular, first we study how to make machines generate image descriptions that are indistinguishable from ones expressed by humans, which as a result also achieved better quantitative evaluation performance. Second, we devise a novel algorithm for measuring review congruence, which takes an image and review text as input and quantifies the relevance of each sentence to the image. The whole model is trained without any supervised ground truth labels. Finally, we propose a brand new AI task called Image Aspect Mining, to detect visual aspects in images and identify aspect level rating within the review context. On the theoretical side, this research contributes to multiple research areas in Computer Vision (CV), Natural Language Processing (NLP), interactions between CVandNLP, and Deep Learning. Regarding impact, these techniques will benefit related users such as the visually impaired, customers reading reviews, merchants, and AI researchers in general.
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
44

Chen, Jitong. "On Generalization of Supervised Speech Separation." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492038295603502.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Tovedal, Sofiea. "On The Effectiveness of Multi-TaskLearningAn evaluation of Multi-Task Learning techniques in deep learning models." Thesis, Umeå universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-172257.

Full text
Abstract:
Multi-Task Learning is today an interesting and promising field which many mention as a must for achieving the next level advancement within machine learning. However, in reality, Multi-Task Learning is much more rarely used in real-world implementations than its more popular cousin Transfer Learning. The questionis why that is and if Multi-Task Learning outperforms its Single-Task counterparts. In this thesis different Multi-Task Learning architectures were utilized in order to build a model that can handle labeling real technical issues within two categories. The model faces a challenging imbalanced data set with many labels to choose from and short texts to base its predictions on. Can task-sharing be the answer to these problems? This thesis investigated three Multi-Task Learning architectures and compared their performance to a Single-Task model. An authentic data set and two labeling tasks was used in training the models with the method of supervised learning. The four model architectures; Single-Task, Multi-Task, Cross-Stitched and the Shared-Private, first went through a hyper parameter tuning process using one of the two layer options LSTM and GRU. They were then boosted by auxiliary tasks and finally evaluated against each other.
APA, Harvard, Vancouver, ISO, and other styles
46

Arvidsson, Simon, and Marcus Gullstrand. "Predicting forest strata from point clouds using geometric deep learning." Thesis, Jönköping University, JTH, Avdelningen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-54155.

Full text
Abstract:
Introduction: Number of strata (NoS) is an informative descriptor of forest structure and is therefore useful in forest management. Collection of NoS as well as other forest properties is performed by fieldworkers and could benefit from automation. Objectives: This study investigates automated prediction of NoS from airborne laser scanned point clouds over Swedish forest plots.Methods: A previously suggested approach of using vertical gap probability is compared through experimentation against the geometric neural network PointNet++ configured for ordinal prediction. For both approaches, the mean accuracy is measured for three datasets: coniferous forest, deciduous forest, and a combination of all forests. Results: PointNet++ displayed a better point performance for two out of three datasets, attaining a top mean accuracy of 46.2%. However only the coniferous subset displayed a statistically significant superiority for PointNet++. Conclusion: This study demonstrates the potential of geometric neural networks for data mining of forest properties. The results show that impediments in the data may need to be addressed for further improvements.
APA, Harvard, Vancouver, ISO, and other styles
47

Hrabovszki, Dávid. "Classification of brain tumors in weakly annotated histopathology images with deep learning." Thesis, Linköpings universitet, Statistik och maskininlärning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177271.

Full text
Abstract:
Brain and nervous system tumors were responsible for around 250,000 deaths in 2020 worldwide. Correctly identifying different tumors is very important, because treatment options largely depend on the diagnosis. This is an expert task, but recently machine learning, and especially deep learning models have shown huge potential in tumor classification problems, and can provide fast and reliable support for pathologists in the decision making process. This thesis investigates classification of two brain tumors, glioblastoma multiforme and lower grade glioma in high-resolution H&E-stained histology images using deep learning. The dataset is publicly available from TCGA, and 220 whole slide images were used in this study. Ground truth labels were only available on whole slide level, but due to their large size, they could not be processed by convolutional neural networks. Therefore, patches were extracted from the whole slide images in two sizes and fed into separate networks for training. Preprocessing steps ensured that irrelevant information about the background was excluded, and that the images were stain normalized. The patch-level predictions were then combined to slide level, and the classification performance was measured on a test set. Experiments were conducted about the usefulness of pre-trained CNN models and data augmentation techniques, and the best method was selected after statistical comparisons. Following the patch-level training, five slide aggregation approaches were studied, and compared to build a whole slide classifier model. Best performance was achieved when using small patches (336 x 336 pixels), pre-trained CNN model without frozen layers, and mirroring data augmentation. The majority voting slide aggregation method resulted in the best whole slide classifier with 91.7% test accuracy and 100% sensitivity. In many comparisons, however, statistical significance could not be shown because of the relatively small size of the test set.
APA, Harvard, Vancouver, ISO, and other styles
48

Evangelisti, Davide. "RL-UniBOt: Applicazione di tecniche di Reinforcement Learning al gioco Rocket League." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Find full text
Abstract:
L'applicazione dell'intelligenza artificiale nell'ambito videoludico al fine di creare bot intelligenti è, nella maggior parte dei casi riportati in letteratura, circoscritto a videogiochi dalla complessità limitata. In questo lavoro di tesi si propone lo studio e l'utilizzo di diverse tecniche di Deep Learning per l'addestramento di Artificial Intelligence in grado di giocare a Rocket League. Partendo dall'analisi dettagliata del gioco e sfruttando gli algoritmi allo stato dell'arte del Deep Learning, si presentano molteplici soluzioni studiate appositamente per l'addestramento di bot in grado di comprendere l'ambiente e giocare partite uno contro uno. Si propongono, inoltre, confronti e considerazioni sulle tecniche utilizzate allo scopo di evidenziare pregi e difetti delle soluzioni proposte.
APA, Harvard, Vancouver, ISO, and other styles
49

Noman, Md Kislu. "Deep learning-based seagrass detection and classification from underwater digital images." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2023. https://ro.ecu.edu.au/theses/2648.

Full text
Abstract:
Deep learning is the most popular branch of machine learning and has achieved great success in many real-life applications. Deep learning algorithms, in particular Convolutional Neural Networks (CNNs), have rapidly become a method of choice for analysing seagrass image data. Deep learning-based seagrass classification and detection are very challenging due to the limited labelled data, intraclass similarities between species, lighting conditions, and complex shapes and structures in the underwater environment, which make them different from large-scale dataset objects. The light propagating through water is attenuated and scattered selectively, causing severe effects on the quality of underwater images. Besides low contrast, colour distortion and bright specks affect the quality of underwater images. In this thesis, we focus on the problem of single to multi-species seagrass classification and detection from underwater digital images. We investigated the existing seagrass classification and detection models and systematically attempted to improve the performance of seagrass classification and detection by developing different models on several seagrass datasets. CNNs are a class of artificial neural networks commonly used in deep learning architectures for image recognition, object localization or mapping tasks. CNN-based models are gaining popularity in seagrass identification or mapping due to their automatic feature extraction ability and higher performance over machine learning techniques. Making a deep learning-based model for all domain users (not only computer vision experts or engineers) is also a challenging task because CNNs development requires architectural engineering and hyperparameter tuning. This thesis investigates the effective development of CNNs on multi-species seagrass datasets to minimise the requirement of architectural engineering and manual hyperparameter tuning for CNN models. This thesis develops a novel metaheuristic algorithm called Opposition-based Flow Direction Algorithm (OFDA) by leveraging the power of the Opposition-based learning technique into the Flow Direction Algorithm to tune and automate the development of CNNs. The proposed deep neuroevolutionary algorithm (OFDA-CNN) outperformed other eight popular optimisation-based neuroevolutionary algorithms on a newly developed multi-species seagrass dataset. The OFDA-CNN algorithm also outperformed the state-of-the-art multi-species seagrass classification performances on publicly available seagrass datasets. This thesis also proposes another novel metaheuristic algorithm called Boosted Atomic Orbital Search (BAOS) to optimize the architecture and tune the hyperparameter of a CNN. The proposed BAOS algorithm improved the search capability of the original version of the Atomic Orbital Search algorithm by incorporating the L´evy flight technique. The optimized deep neuroevolutionary (BAOS-CNN) algorithm achieved the highest accuracy among seven popular optimisation-based CNNs. The BAOS-CNN algorithm also outperformed the state-of-the-art multi-species seagrass classification performances. This thesis proposes also a two-stage semi-supervised framework for leveraging huge unlabelled seagrass data. We propose an EfficientNet-B5-based semi-supervised framework that leverages a large collection of unlabelled seagrass data with the guidance of a small, labelled seagrass dataset. We introduced a multi-species seagrass classifier based on EfficientNet-B5 that outperformed the state-of-the-art multi-species seagrass classification performances. This thesis also developed a two and half times larger multi-species dataset than the largest publicly available ‘DeepSeagrass’ dataset. To evaluate the performance of all the proposed models, we trained and tested them on the newly developed and some publicly available challenging seagrass datasets. Our rigorous experiments demonstrated how our models were capable of producing state-of-the-art performances of seagrass classification and detection in both single and multi-species scenarios.
APA, Harvard, Vancouver, ISO, and other styles
50

Jacobzon, Gustaf. "Multi-site Organ Detection in CT Images using Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279290.

Full text
Abstract:
When optimizing a controlled dose in radiotherapy, high resolution spatial information about healthy organs in close proximity to the malignant cells are necessary in order to mitigate dispersion into these organs-at-risk. This information can be provided by deep volumetric segmentation networks, such as 3D U-Net. However, due to limitations of memory in modern graphical processing units, it is not feasible to train a volumetric segmentation network on full image volumes and subsampling the volume gives a too coarse segmentation. An alternative is to sample a region of interest from the image volume and train an organ-specific network. This approach requires knowledge of which region in the image volume that should be sampled and can be provided by a 3D object detection network. Typically the detection network will also be region specific, although a larger region such as the thorax region, and requires human assistance in choosing the appropriate network for a certain region in the body.  Instead, we propose a multi-site object detection network based onYOLOv3 trained on 43 different organs, which may operate on arbitrary chosen axial patches in the body. Our model identifies the organs present (whole or truncated) in the image volume and may automatically sample a region from the input and feed to the appropriate volumetric segmentation network. We train our model on four small (as low as 20 images) site-specific datasets in a weakly-supervised manner in order to handle the partially unlabeled nature of site-specific datasets. Our model is able to generate organ-specific regions of interests that enclose 92% of the organs present in the test set.
Vid optimering av en kontrollerad dos inom strålbehandling krävs det information om friska organ, så kallade riskorgan, i närheten av de maligna cellerna för att minimera strålningen i dessa organ. Denna information kan tillhandahållas av djupa volymetriskta segmenteringsnätverk, till exempel 3D U-Net. Begränsningar i minnesstorleken hos moderna grafikkort gör att det inte är möjligt att träna ett volymetriskt segmenteringsnätverk på hela bildvolymen utan att först nedsampla volymen. Detta leder dock till en lågupplöst segmentering av organen som inte är tillräckligt precis för att kunna användas vid optimeringen. Ett alternativ är att endast behandla en intresseregion som innesluter ett eller ett fåtal organ från bildvolymen och träna ett regionspecifikt nätverk på denna mindre volym. Detta tillvägagångssätt kräver dock information om vilket område i bildvolymen som ska skickas till det regionspecifika segmenteringsnätverket. Denna information kan tillhandahållas av ett 3Dobjektdetekteringsnätverk. I regel är även detta nätverk regionsspecifikt, till exempel thorax-regionen, och kräver mänsklig assistans för att välja rätt nätverk för en viss region i kroppen. Vi föreslår istället ett multiregions-detekteringsnätverk baserat påYOLOv3 som kan detektera 43 olika organ och fungerar på godtyckligt valda axiella fönster i kroppen. Vår modell identifierar närvarande organ (hela eller trunkerade) i bilden och kan automatiskt ge information om vilken region som ska behandlas av varje regionsspecifikt segmenteringsnätverk. Vi tränar vår modell på fyra små (så lågt som 20 bilder) platsspecifika datamängder med svag övervakning för att hantera den delvis icke-annoterade egenskapen hos datamängderna. Vår modell genererar en organ-specifik intresseregion för 92 % av organen som finns i testmängden.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography