Dissertations / Theses: 'Multitask learning'

1

Patel, Vatsa Sanjay. "Masked Face Analysis via Multitask Deep Learning." University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1619637677725646.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Romera, Paredes B. "Multitask and transfer learning for multi-aspect data." Thesis, University College London (University of London), 2014. http://discovery.ucl.ac.uk/1457869/.

Full text

Abstract:

Supervised learning aims to learn functional relationships between inputs and outputs. Multitask learning tackles supervised learning tasks by performing them simultaneously to exploit commonalities between them. In this thesis, we focus on the problem of eliminating negative transfer in order to achieve better performance in multitask learning. We start by considering a general scenario in which the relationship between tasks is unknown. We then narrow our analysis to the case where data are characterised by a combination of underlying aspects, e.g., a dataset of images of faces, where each face is determined by a person's facial structure, the emotion being expressed, and the lighting conditions. In machine learning there have been numerous efforts based on multilinear models to decouple these aspects but these have primarily used techniques from the field of unsupervised learning. In this thesis we take inspiration from these approaches and hypothesize that supervised learning methods can also benefit from exploiting these aspects. The contributions of this thesis are as follows: 1. A multitask learning and transfer learning method that avoids negative transfer when there is no prescribed information about the relationships between tasks. 2. A multitask learning approach that takes advantage of a lack of overlapping features between known groups of tasks associated with different aspects. 3. A framework which extends multitask learning using multilinear algebra, with the aim of learning tasks associated with a combination of elements from different aspects. 4. A novel convex relaxation approach that can be applied both to the suggested framework and more generally to any tensor recovery problem. Through theoretical validation and experiments on both synthetic and real-world datasets, we show that the proposed approaches allow fast and reliable inferences. Furthermore, when performing learning tasks on an aspect of interest, accounting for secondary aspects leads to significantly more accurate results than using traditional approaches.

APA, Harvard, Vancouver, ISO, and other styles

3

Settipalli, Venkata Sai Sukesh, and Naga Manendra Kumar Dasireddy. "Reducing Unintended bias in Text Classification using Multitask learning." Thesis, Blekinge Tekniska Högskola, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21174.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Yu, Qingtian. "Deep Learning-Enabled Multitask System for Exercise Recognition and Counting." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42686.

Full text

Abstract:

Exercise is a prevailing topic in modern society as more people are pursuing a healthy lifestyle. Physical activities provide unimaginable benefits to human well-being from the inside out. 2D human pose estimation, action recognition and repetitive counting fields developed rapidly in the past several years. However, few works combined them together as a whole system to assist people in evaluating body poses, recognizing exercises and counting repetitive actions. The existing methods estimate pose positions first, and utilize human joints locations in the other two tasks. In this thesis, we propose a multitask system covering the three domains. Different from the methodology used in the literature, heatmaps which are the byproducts of 2D human pose estimation models are adopted for exercise recognition and counting. Recent heatmap processing methods are proven effective in extracting dynamic body pose information. Inspired by this, we propose a new deep-learning multitask model of exercise recognition & repetition counting, and apply these approaches to the multitask for the first time. To meet the needs of the multitask model, we create a new dataset Rep-Penn with action, counting and speed labels. A two-stage training strategy is applied in the training process. Our multitask system can estimate human pose, identify physical activities and count repeated motions. We achieved 95.69% accuracy in exercise recognition on Rep-Penn dataset. The multitask model also performed well in repetitive counting with 0.004 Mean Average Error (MAE) and 0.997 Off-By-One (OBO) accuracy on Rep-Penn dataset. Compared with existing frameworks, our method obtained state-of-the-art results.

APA, Harvard, Vancouver, ISO, and other styles

5

Nina, Oliver A. Nina. "A Multitask Learning Encoder-N-Decoder Framework for Movie and Video Description." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1531996548147165.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Lin, Yu-Kai, Hsinchun Chen, Randall A. Brown, Shu-Hsing Li, and Hung-Jen Yang. "HEALTHCARE PREDICTIVE ANALYTICS FOR RISK PROFILING IN CHRONIC CARE: A BAYESIAN MULTITASK LEARNING APPROACH." SOC INFORM MANAGE-MIS RES CENT, 2017. http://hdl.handle.net/10150/625248.

Full text

Abstract:

Clinical intelligence about a patient's risk of future adverse health events can support clinical decision making in personalized and preventive care. Healthcare predictive analytics using electronic health records offers a promising direction to address the challenging tasks of risk profiling. Patients with chronic diseases often face risks of not just one, but an array of adverse health events. However, existing risk models typically focus on one specific event and do not predict multiple outcomes. To attain enhanced risk profiling, we adopt the design science paradigm and propose a principled approach called Bayesian multitask learning (BMTL). Considering the model development for an event as a single task, our BMTL approach is to coordinate a set of baseline models-one for each event-and communicate training information across the models. The BMTL approach allows healthcare providers to achieve multifaceted risk profiling and model an arbitrary number of events simultaneously. Our experimental evaluations demonstrate that the BMTL approach attains an improved predictive performance when compared with the alternatives that model multiple events separately. We also find that, in most cases, the BMTL approach significantly outperforms existing multitask learning techniques. More importantly, our analysis shows that the BMTL approach can create significant potential impacts on clinical practice in reducing the failures and delays in preventive interventions. We discuss several implications of this study for health IT, big data and predictive analytics, and design science research.

APA, Harvard, Vancouver, ISO, and other styles

7

VALSECCHI, CECILE. "Advancing the prediction of Nuclear Receptor modulators through machine learning methods." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2022. http://hdl.handle.net/10281/356289.

Full text

Abstract:

I recettori nucleari sono fattori di trascrizione coinvolti in processi critici per la salute umana e sono un obiettivo rilevante per la valutazione del rischio tossicologico e il processo di scoperta dei farmaci. I modelli computazionali possono essere uno strumento utile (i) per dare priorità alla sperimentazione di sostanze chimiche che possono imitare gli ormoni naturali e quindi essere interferenti endocrini e (ii) per identificare nuovi possibili candidati farmaci. Pertanto, l'obiettivo principale di questo progetto è quello di studiare le potenziali interazioni tra sostanze chimiche e recettori nucleari, con il duplice scopo di sviluppare strumenti in silico per la ricerca di nuovi modulatori e di identificare possibili sostanze chimiche che alterano il sistema endocrino. Dopo aver creato una collezione esaustiva di modulatori di recettori nucleari, abbiamo applicato metodi di apprendimento automatico per colmare il vuoto di dati e predire nuovi possibili modulatori tramite modelli predittivi. In particolare, le strategie di modellazione hanno incluso algoritmi di apprendimento automatico multi-tasking per indagare le complesse relazioni tra sostanze chimiche e diversi recettori nucleari. Nuclear receptors are transcription factors involved in processes critical to human health and are a relevant target for toxicological risk assessment and the drug discovery process. Computational models can be a useful tool (i) to prioritize chemicals that can mimic natural hormones and thus be endocrine disruptors and (ii) to identify new possible lead for drug discovery. Therefore, the main goal of this project is to study potential interactions between chemicals and nuclear receptors, with the dual purpose of developing in silico tools to search for new modulators and to identify possible endocrine disrupting chemicals. After creating an exhaustive collection of nuclear receptor modulators, we applied machine learning methods to fill the data gap and prioritize modulators by building predictive models. In particular, modeling strategies included multi-tasking machine learning algorithms to investigate the complex relationships between chemicals and multiple nuclear receptors.

APA, Harvard, Vancouver, ISO, and other styles

8

Zylich, Brian Matthew. "Training Noise-Robust Spoken Phrase Detectors with Scarce and Private Data: An Application to Classroom Observation Videos." Digital WPI, 2019. https://digitalcommons.wpi.edu/etd-theses/1289.

Full text

Abstract:

We explore how to automatically detect specific phrases in audio from noisy, multi-speaker videos using deep neural networks. Specifically, we focus on classroom observation videos that contain a few adult teachers and several small children (< 5 years old). At any point in these videos, multiple people may be talking, shouting, crying, or singing simultaneously. Our goal is to recognize polite speech phrases such as "Good job", "Thank you", "Please", and "You're welcome", as the occurrence of such speech is one of the behavioral markers used in classroom observation coding via the Classroom Assessment Scoring System (CLASS) protocol. Commercial speech recognition services such as Google Cloud Speech are impractical because of data privacy concerns. Therefore, we train and test our own custom models using a combination of publicly available classroom videos from YouTube, as well as a private dataset of real classroom observation videos collected by our colleagues at the University of Virginia. We also crowdsource an additional 1152 recordings of polite speech phrases to augment our training dataset. Our contributions are the following: (1) we design a crowdsourcing task for efficiently labeling speech events in classroom videos, (2) we develop a neural network-based architecture for speech recognition, robust to noise and overlapping speech, and (3) we explore methods to synthesize new and authentic audio data, both to increase the training set size and reduce the class imbalance. Finally, using our trained polite speech detector, (4) we investigate the relationship between polite speech and CLASS scores and enable teachers to visualize their use of polite language.

APA, Harvard, Vancouver, ISO, and other styles

9

Bao, Guoqing. "End-to-End Machine Learning Models for Multimodal Medical Data Analysis." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/28153.

Full text

Abstract:

The pathogenesis of infectious and severe diseases including COVID-19, metabolic disorders, and cancer can be highly complicated because it involves abnormalities in genetic, metabolic, anatomical as well as functional levels. The deteriorative changes could be quantitatively monitored on biochemical markers, genome-wide assays as well as different imaging modalities including radiographic and pathological data. Multimodal medical data, involving three common and essential diagnostic disciplines, i.e., pathology, radiography, and genomics, are increasingly utilized to unravel the complexity of the diseases. High-throughput and deep features can be extracted from different types of medical data to characterize diseases in various quantitative aspects, e.g., compactness and flatness of tumors, and heterogeneity of tissues. State-of-the-art deep learning methods including convolutional neural networks (CNNs) and Transformer have achieved impressive results in analyses of natural image, text, and voice data through an intrinsic and latent manner. However, there are many obstacles and challenges when applying existing machine learning models that initially tuned on natural image and language data to clinical practice, such as shortage of labeled data, distribution and domain discrepancy, data heterogeneity and imbalance, etc. Moreover, those methods are not designed to harness multimodal data under a unified and end-to-end learning paradigm, making them heavily relying on expert involvement and more prone to be affected by intra- and inter-observer variability. To address those limitations, in this thesis, we present novel end-to-end machine learning methods to learn fused feature representations from multimodal medical data, and perform quantitative analyses to identify significant higher-level features from raw medical data in explanation of the characteristics and outcomes of the infectious and severe diseases. • Starting from gold standard pathology images, we propose a bifocal weakly-supervised method which is able to complementarily and simultaneously capture two types of discriminative regions from both shorter and longer image tiles under a small amount of sparsely labeled data to improve recognition and cross-modality analyses of complex morphological and immunohistochemical structures in entire and adjacent multimodal histological slides. • Then, we expand our research on data collected from non-invasive approaches, we present an end-to-end multitask learning model for automated and simultaneous diagnosis and severity assessment of infectious disease which obviates the need for expert involvement, and Shift3D and Random-weighted multitask loss function are two novel algorithm components proposed to learn shift-invariant and shareable representations from fused radiographic imaging and high-throughput numerical data to accelerate model convergence, improve joint learning performance, and resist the influence of intra- and inter-observer variability. • Next, we further involve time-dimension data and invent the machine learning-based method to locate representative imaging features to tackle the problem of non-invasive diagnostic side effects, i.e., radiation, and the low-radiation and non-invasive solution can be used on progression analysis of metabolic disorders over time and evaluation of surgery-induced weight loss effects. • Lastly, we investigate genomic data given genetic disorders can lead to diverse diseases, we build a machine learning pipeline for processing genomic data and analyzing disease prognosis by incorporating statistical power, biological rationale, and machine learning algorithms as a unified prognostic feature extractor. We carried out rigorous and extensive experiments on two large public datasets and two private cohorts covering various forms of medical data, e.g., biochemical markers, genomic profiles, radiomic features, radiological and pathological imaging data. The experiments demonstrated that our proposed machine learning approaches are able to achieve better performances compared to corresponding state-of-the-art methods, and subsequently improve the diagnostic and/or prognostic workflows of infectious and severe diseases including COVID-19, metabolic disorders, and cancer.

APA, Harvard, Vancouver, ISO, and other styles

10

Widmer, Christian Verfasser], Klaus-Robert [Akademischer Betreuer] [Müller, Gunnar [Akademischer Betreuer] Rätsch, and Klaus [Akademischer Betreuer] Obermayer. "Regularization-based multitask learning with applications in computational biology / Christian Widmer. Gutachter: Klaus-Robert Müller ; Gunnar Rätsch ; Klaus Obermayer. Betreuer: Klaus-Robert Müller ; Gunnar Rätsch." Berlin : Technische Universität Berlin, 2014. http://d-nb.info/1068856017/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Johnson, Travis Steele. "Integrative approaches to single cell RNA sequencing analysis." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1586960661272666.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Carbonera, Luvizon Diogo. "Apprentissage automatique pour la reconnaissance d'action humaine et l'estimation de pose à partir de l'information 3D." Thesis, Cergy-Pontoise, 2019. http://www.theses.fr/2019CERG1015.

Full text

Abstract:

La reconnaissance d'actions humaines en 3D est une tâche difficile en raisonde la complexité de mouvements humains et de la variété des poses et desactions accomplies par différents sujets. Les technologies récentes baséessur des capteurs de profondeur peuvent fournir les représentationssquelettiques à faible coût de calcul, ce qui est une information utilepour la reconnaissance d'actions.Cependant, ce type de capteurs se limite à des environnementscontrôlés et génère fréquemment des données bruitées. Parallèlement à cesavancées technologiques, les réseaux de neurones convolutifs (CNN) ontmontré des améliorations significatives pour la reconnaissance d’actions etpour l’estimation de la pose humaine en 3D à partir des images couleurs.Même si ces problèmes sont étroitement liés, les deux tâches sont souventtraitées séparément dans la littérature.Dans ce travail, nous analysons le problème de la reconnaissance d'actionshumaines dans deux scénarios: premièrement, nous explorons lescaractéristiques spatiales et temporelles à partir de représentations desquelettes humains, et qui sont agrégées par une méthoded'apprentissage de métrique. Dans le deuxième scénario, nous montrons nonseulement l'importance de la précision de la pose en 3D pour lareconnaissance d'actions, mais aussi que les deux tâches peuvent êtreefficacement effectuées par un seul réseau de neurones profond capabled'obtenir des résultats du niveau de l'état de l'art.De plus, nous démontrons que l'optimisation de bout en bout en utilisant lapose comme contrainte intermédiaire conduit à une précision plus élevée sur latâche de reconnaissance d'action que l'apprentissage séparé de ces tâches. Enfin, nous proposons une nouvellearchitecture adaptable pour l’estimation de la pose en 3D et la reconnaissancede l’actions simultanément et en temps réel. Cette architecture offre une gammede compromis performances vs vitesse avec une seule procédure d’entraînementmultitâche et multimodale 3D human action recognition is a challenging task due to the complexity ofhuman movements and to the variety on poses and actions performed by distinctsubjects. Recent technologies based on depth sensors can provide 3D humanskeletons with low computational cost, which is an useful information foraction recognition. However, such low cost sensors are restricted tocontrolled environment and frequently output noisy data. Meanwhile,convolutional neural networks (CNN) have shown significant improvements onboth action recognition and 3D human pose estimation from RGB images. Despitebeing closely related problems, the two tasks are frequently handled separatedin the literature. In this work, we analyze the problem of 3D human actionrecognition in two scenarios: first, we explore spatial and temporalfeatures from human skeletons, which are aggregated by a shallow metriclearning approach. In the second scenario, we not only show that precise 3Dposes are beneficial to action recognition, but also that both tasks can beefficiently performed by a single deep neural network and stillachieves state-of-the-art results. Additionally, wedemonstrate that optimization from end-to-end using poses as an intermediateconstraint leads to significant higher accuracy on the action task thanseparated learning. Finally, we propose a new scalable architecture forreal-time 3D pose estimation and action recognition simultaneously, whichoffers a range of performance vs speed trade-off with a single multimodal andmultitask training procedure

APA, Harvard, Vancouver, ISO, and other styles

13

Bellón, Molina Víctor. "Prédiction personalisée des effets secondaires indésirables de médicaments." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEM023/document.

Full text

Abstract:

Les effets indésirables médicamenteux (EIM) ont des répercussions considérables tant sur la santé que sur l'économie. De 1,9% à 2,3% des patients hospitalisés en sont victimes, et leur coût a récemment été estimé aux alentours de 400 millions d'euros pour la seule Allemagne. De plus, les EIM sont fréquemment la cause du retrait d'un médicament du marché, conduisant à des pertes pour l'industrie pharmaceutique se chiffrant parfois en millions d'euros.De multiples études suggèrent que des facteurs génétiques jouent un rôle non négligeable dans la réponse des patients à leur traitement. Cette réponse comprend non seulement les effets thérapeutiques attendus, mais aussi les effets secondaires potentiels. C'est un phénomène complexe, et nous nous tournons vers l'apprentissage statistique pour proposer de nouveaux outils permettant de mieux le comprendre.Nous étudions différents problèmes liés à la prédiction de la réponse d'un patient à son traitement à partir de son profil génétique. Pour ce faire, nous nous plaçons dans le cadre de l'apprentissage statistique multitâche, qui consiste à combiner les données disponibles pour plusieurs problèmes liés afin de les résoudre simultanément.Nous proposons un nouveau modèle linéaire de prédiction multitâche qui s'appuie sur des descripteurs des tâches pour sélectionner les variables pertinentes et améliorer les prédictions obtenues par les algorithmes de l'état de l'art. Enfin, nous étudions comment améliorer la stabilité des variables sélectionnées, afin d'obtenir des modèles interprétables Adverse drug reaction (ADR) is a serious concern that has important health and economical repercussions. Between 1.9%-2.3% of the hospitalized patients suffer from ADR, and the annual cost of ADR have been estimated to be of 400 million euros in Germany alone. Furthermore, ADRs can cause the withdrawal of a drug from the market, which can cause up to millions of dollars of losses to the pharmaceutical industry.Multiple studies suggest that genetic factors may play a role in the response of the patients to their treatment. This covers not only the response in terms of the intended main effect, but also % according toin terms of potential side effects. The complexity of predicting drug response suggests that machine learning could bring new tools and techniques for understanding ADR.In this doctoral thesis, we study different problems related to drug response prediction, based on the genetic characteristics of patients.We frame them through multitask machine learning frameworks, which combine all data available for related problems in order to solve them at the same time.We propose a novel model for multitask linear prediction that uses task descriptors to select relevant features and make predictions with better performance as state-of-the-art algorithms. Finally, we study strategies for increasing the stability of the selected features, in order to improve interpretability for biological applications

APA, Harvard, Vancouver, ISO, and other styles

14

Truffinet, Olivier. "Machine learning methods for cross-section reconstruction in full-core deterministic neutronics codes." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASP128.

Full text

Abstract:

Les simulateurs déterministes de neutronique pour les réacteurs nucléaires suivent aujourd'hui majoritairement un schéma multi-échelles à deux étapes. Au cours d'un calcul dit « réseau », la physique est finement résolue au niveau des motifs élémentaires du réacteur (assemblages de combustible) ; puis, ces motifs sont mis en contact dans un calcul dit « cœur », où la configuration globale est calculée de manière plus grossière. La communication entre ces deux codes se fait de manière différée par le transfert de données physiques, dont les plus importantes se nomment « sections efficaces homogénéisées » (notées ci-après HXS) et peuvent être représentées par des fonctions multivariées. Leur utilisation différée et leur dépendance à des conditions physiques variables imposent un schéma de type tabulation-interpolation : les HXS sont précalculées dans une large gamme de situations, stockées, puis approximées dans le code cœur à partir de ces données afin de correspondre à un état bien précis du réacteur. Dans un contexte d'augmentation de la finesse des simulations, les outils mathématiques actuellement utilisés pour cette étape d'approximation montrent aujourd'hui leurs limites ; la problématique de cette thèse est ainsi de leur trouver des remplaçants, capables de rendre l'interpolation des HXS plus précise, plus économe en données et en espace de stockage, et tout aussi rapide. Tout l'arsenal du machine learning, de l'approximation fonctionnelle, etc, peut être mis à contribution pour traiter ce problème.Afin de trouver un modèle d'approximation adapté au problème, l'on a commencé par une analyse des jeux de données générés pour cette thèse : corrélations entre les HXS, allure de leurs dépendances, dimension linéaire, etc. Ce dernier point s'est révélé particulièrement fructueux : les jeux de HXS s'avèrent être d'une très faible dimension effective, ce qui permet de simplifier grandement leur approximation. En particulier, l'on a développé une méthodologie innovante basée sur l'Empirical Interpolation Method (EIM), capable de remplacer la majorité des appels au code réseau par des extrapolations d'un petit volume de données, et de réduire le stockage des HXS d'un ou deux ordres de grandeur - le tout occasionnant une perte de précision négligeable. Pour conserver les avantages d'une telle méthodologie tout en répondant à la totalité de la problématique de thèse, l'on s'est ensuite tourné vers un puissant modèle de machine learning épousant la même structure de faible dimension : les processus gaussiens multi-sorties (MOGP). Procédant par étapes depuis les modèles gaussiens les plus simples (GP mono-sorties) jusqu'à de plus complexes, l'on a montré que ces outils sont pleinement adaptés au problème considéré, et permettent des gains majeurs par rapport à l'existant. De nombreux choix de modélisation ont été discutés et comparés ; les modèles ont été adaptés à des données de très grande taille, requérant une optimisation de leur implémentation ; et les fonctionnalités nouvelles qu'ils offrent ont été expérimentées, notamment la prédiction d'incertitudes et l'apprentissage actif.Enfin, un travail théorique a été accompli sur la famille de modèles étudiées - le Linear Model of Co-regionalisation (LMC) - afin d'éclairer certaines zones d'ombre de leur théorie encore jeune. Cette réflexion a mené à la définition d'un nouveau modèle, le PLMC, qui a été implémenté, optimisé et testé sur de nombreux jeux de données réelles et synthétiques. Plus simple que ses concurrents, ce modèle s'est aussi révélé autant voire plus précis et rapide, et doté de plusieurs fonctionnalités exclusives, mises à profit durant la thèse.Ce travail ouvre de multiples perspectives pour la simulation neutronique. Doté de modèles d'apprentissage puissants et flexibles, l'on peut envisager des évolutions importantes des codes : propagation systématique des incertitudes, correction de diverses approximations, prise en compte de davantage de variables… Today, most deterministic neutronics simulators for nuclear reactors follow a two-step multi-scale scheme. In a so-called “lattice” calculation, the physics is finely resolved at the level of the elementary reactor pattern (fuel assemblies); these tiles are then brought into contact in a so-called “core” calculation, where the overall configuration is calculated more coarsely. Communication between these two codes is realized by the deferred transfer of physical data, the most important of which are called “homogenized cross sections” (hereafter referred to as HXS) and can be represented by multivariate functions. Their deferred use and dependence on variable physical conditions call for a tabulation-interpolation scheme: HXS are precalculated in a wide range of situations, stored, then approximated in the core code from the stored values to correspond to a specific reactor state. In a context of increasing simulation finesse, the mathematical tools currently used for this approximation stage are now showing their limitations. The aim of this thesis is to find replacements for them, capable of making HXS interpolation more accurate, more economical in terms of data and storage space, and just as fast. The whole arsenal of machine learning, functional approximation, etc., can be put at use to tackle this problem.In order to find a suitable approximation model, we began by analyzing the datasets generated for this thesis: correlations between HXS's, shapes of their dependencies, linear dimension, etc. This last point proved particularly fruitful: HXS sets turn out to be of very low effective dimension, which greatly simplifies their approximation. In particular, we leveraged this fact to develop an innovative methodology based on the Empirical Interpolation Method (EIM), capable of replacing the majority of lattice code calls by extrapolations from a small volume of data, and reducing HXS storage by one or two orders of magnitude - all with a negligible loss of accuracy.To retain the advantages of such a methodology while addressing the full scope of the thesis problem, we then turned to a powerful machine learning model matching the same low-dimensional structure: multi-output Gaussian processes (MOGPs). Proceeding step by step from the simplest Gaussian models (single-output GPs) to most complex ones, we showed that these tools are fully adapted to the problem under consideration, and offer major gains over current HXS interpolation routines. Numerous modeling choices were discussed and compared; models were adapted to very large data, requiring some optimization of their implementation; and the new functionalities which they offer were tested, notably uncertainty prediction and active learning.Finally, theoretical work was carried out on the studied family of models - the Linear Model of Co-regionalisation (LMC) - in order to shed light on certain grey areas in their still young theory. This led to the definition of a new model, the PLMC, which was implemented, optimized and tested on numerous real and synthetic data sets. Simpler than its competitors, this model has also proved to be just as accurate and fast if not more so, and holds a number of exclusive functionalities that were put to good use during the thesis.This work opens up many new prospects for neutronics simulation. Equipped with powerful and flexible learning models, it is possible to envisage significant evolutions for deterministic codes: systematic propagation of uncertainties, correction of various approximations, taking into account of more variables

APA, Harvard, Vancouver, ISO, and other styles

15

Coavoux, Maximin. "Discontinuous constituency parsing of morphologically rich languages." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCC032.

Full text

Abstract:

L’analyse syntaxique consiste à prédire la représentation syntaxique de phrases en langue naturelle sous la forme d’arbres syntaxiques. Cette tâche pose des problèmes particuliers pour les langues non-configurationnelles ou qui ont une morphologie flexionnelle plus riche que celle de l’anglais. En particulier, ces langues manifestent une dispersion lexicale problématique, des variations d’ordre des mots plus fréquentes et nécessitent de prendre en compte la structure interne des mots-formes pour permettre une analyse syntaxique de qualité satisfaisante. Dans cette thèse, nous nous plaçons dans le cadre de l’analyse syntaxique robuste en constituants par transitions. Dans un premier temps, nous étudions comment intégrer l’analyse morphologique à l’analyse syntaxique, à l’aide d’une architecture de réseaux de neurones basée sur l’apprentissage multitâches. Dans un second temps, nous proposons un système de transitions qui permet de prédire des structures générées par des grammaires légèrement sensibles au contexte telles que les LCFRS. Enfin, nous étudions la question de la lexicalisation de l’analyse syntaxique. Les analyseurs syntaxiques en constituants lexicalisés font l’hypothèse que les constituants s’organisent autour d’une tête lexicale et que la modélisation des relations bilexicales est cruciale pour désambiguïser. Nous proposons un système de transition non lexicalisé pour l’analyse en constituants discontinus et un modèle de scorage basé sur les frontières de constituants et montrons que ce système, plus simple que des systèmes lexicalisés, obtient de meilleurs résultats que ces derniers Syntactic parsing consists in assigning syntactic trees to sentences in natural language. Syntactic parsing of non-configurational languages, or languages with a rich inflectional morphology, raises specific problems. These languages suffer more from lexical data sparsity and exhibit word order variation phenomena more frequently. For these languages, exploiting information about the internal structure of word forms is crucial for accurate parsing. This dissertation investigates transition-based methods for robust discontinuous constituency parsing. First of all, we propose a multitask learning neural architecture that performs joint parsing and morphological analysis. Then, we introduce a new transition system that is able to predict discontinuous constituency trees, i.e.\ syntactic structures that can be seen as derivations of mildly context-sensitive grammars, such as LCFRS. Finally, we investigate the question of lexicalization in syntactic parsing. Some syntactic parsers are based on the hypothesis that constituent are organized around a lexical head and that modelling bilexical dependencies is essential to solve ambiguities. We introduce an unlexicalized transition system for discontinuous constituency parsing and a scoring model based on constituent boundaries. The resulting parser is simpler than lexicalized parser and achieves better results in both discontinuous and projective constituency parsing

APA, Harvard, Vancouver, ISO, and other styles

16

Laurençon, Hugo. "Foundation Vision-Language models." Electronic Thesis or Diss., Sorbonne université, 2025. http://www.theses.fr/2025SORUS004.

Full text

Abstract:

Ces dernières années, les grands modèles de langage (LLMs) ont montré des performances remarquables sur des tâches variées et ont commencé à être largement intégrés dans diverses applications. Les modèles vision-langage (VLMs), qui étendent les LLMs en incorporant en plus la compréhension d'images, offrent un potentiel supplémentaire. Ces modèles pourraient transformer des domaines tels que le développement web en traduisant des captures d'écran de pages web en code, faciliter la recherche d'informations via une navigation web autonome, et automatiser des tâches comme la classification et la synthèse de documents dans des contextes professionnels. Cependant, malgré l'intérêt croissant pour les VLMs, leur développement en est encore à ses débuts, et de nombreux aspects clés du processus de création de ces modèles manquent de consensus parmi les chercheurs. Cette thèse vise à améliorer les capacités des VLMs en augmentant à la fois leurs performances sur des benchmarks académiques et en explorant leurs applications à des problèmes réels. L'accent est mis sur le développement d'un modèle VLM fondamental, capable de gérer une large gamme de tâches sans se spécialiser sur une seule.Le développement de VLMs grands et efficaces nécessite l'accès à des ensembles de données de pré-entraînement vastes, diversifiés et de haute qualité. Ainsi, cette thèse commence par introduire OBELICS, un ensemble de données ouvertes à l'échelle du web composé de documents image-texte imbriqués, qui a fait l'objet d'un filtrage rigoureux. Nous décrivons le processus de création de cet ensemble de données, présentons les règles de filtrage, et fournissons une analyse du contenu des données. Pour valider OBELICS, nous avons entraîné un VLM de 80 milliards de paramètres, nommé Idefics1, qui atteint des performances compétitives par rapport aux meilleurs modèles propriétaires au moment de son développement. Ensuite, nous étudions la conception architecturale des VLMs, en menant une série d'expériences sur des modèles pré-entraînés, des choix d'architecture et des méthodes d'entraînement. Les résultats de ces expériences ont permis le développement d'Idefics2, un VLM de 8 milliards de paramètres qui démontre des bonnes performances sur plusieurs benchmarks multimodaux au moment de sa sortie. Enfin, nous offrons un aperçu complet des avancées les plus récentes dans la recherche sur les VLMs, en soulignant les forces et faiblesses de chacune, en abordant les principaux défis du domaine, et en suggérant des directions de recherche prometteuses pour des zones encore peu explorées. Sur la base de cette analyse, nous avons développé Idefics3, un VLM qui surpasse les performances de son prédécesseur, Idefics2, avec des améliorations notables dans la compréhension de documents grâce à la création d'ensembles de données spécialisés In recent years, large language models (LLMs) have shown remarkable performance across diverse tasks and have begun to be widely integrated into various applications. Vision-language models (VLMs), which extend LLMs by incorporating image understanding, offer further potential. These models could reshape areas such as web development by translating webpage screenshots into code, assist in information retrieval through autonomous web navigation, and automate tasks like document classification and summarization in business contexts. However, despite the growing interest in VLMs, their development remains in its early stages, with many key aspects of the model creation pipeline lacking consensus among researchers. This thesis seeks to advance the capabilities of VLMs by both improving performance on academic benchmarks and exploring their application to real-world problems. The focus is on developing a foundation VLM, one capable of handling a wide range of tasks without requiring extensive task-specific fine-tuning. The development of large and efficient VLMs requires access to extensive, diverse, and high-quality pre-training datasets. Therefore, this thesis begins by introducing OBELICS, an open web-scale dataset of interleaved image-text documents that has undergone rigorous filtering. We describe the dataset creation process, present comprehensive filtering rules, and provide an analysis of the dataset's content. To validate OBELICS, we train an open VLM of 80 billion parameters, named Idefics1, which achieves performance competitive with proprietary state-of-the-art models at the time. Next, we investigate the architectural design of VLMs, conducting a series of experiments on pre-trained models, architecture choices, and training methods. These insights enabled the development of Idefics2, an 8-billion-parameter VLM that demonstrates state-of-the-art performance in its size class across several multimodal benchmarks at the time of its release. Finally, we offer a comprehensive overview of the most recent advances in VLM research, highlighting the strengths and weaknesses of each, addressing the major challenges in the field, and suggesting promising research directions for underexplored areas. Building on this overview, we develop Idefics3, a VLM that surpasses the performance of its predecessor, Idefics2, with particular improvements in document understanding through the creation of specialized datasets

APA, Harvard, Vancouver, ISO, and other styles

17

Caye, Daudt Rodrigo. "Convolutional neural networks for change analysis in earth observation images with noisy labels and domain shifts." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT033.

Full text

Abstract:

L'analyse de l'imagerie satellitaire et aérienne d'observation de la Terre nous permet d'obtenir des informations précises sur de vastes zones. Une analyse multitemporelle de telles images est nécessaire pour comprendre l'évolution de ces zones. Dans cette thèse, les réseaux de neurones convolutifs sont utilisés pour détecter et comprendre les changements en utilisant des images de télédétection provenant de diverses sources de manière supervisée et faiblement supervisée. Des architectures siamoises sont utilisées pour comparer des paires d'images recalées et identifier les pixels correspondant à des changements. La méthode proposée est ensuite étendue à une architecture de réseau multitâche qui est utilisée pour détecter les changements et effectuer une cartographie automatique simultanément, ce qui permet une compréhension sémantique des changements détectés. Ensuite, un filtrage de classification et un nouvel algorithme de diffusion anisotrope guidée sont utilisés pour réduire l'effet du bruit d'annotation, un défaut récurrent pour les ensembles de données à grande échelle générés automatiquement. Un apprentissage faiblement supervisé est également réalisé pour effectuer une détection de changement au niveau des pixels en utilisant uniquement une supervision au niveau de l'image grâce à l'utilisation de cartes d'activation de classe et d'une nouvelle couche d'attention spatiale. Enfin, une méthode d'adaptation de domaine fondée sur un entraînement adverse est proposée. Cette méthode permet de projeter des images de différents domaines dans un espace latent commun où une tâche donnée peut être effectuée. Cette méthode est testée non seulement pour l'adaptation de domaine pour la détection de changement, mais aussi pour la classification d'images et la segmentation sémantique, ce qui prouve sa polyvalence The analysis of satellite and aerial Earth observation images allows us to obtain precise information over large areas. A multitemporal analysis of such images is necessary to understand the evolution of such areas. In this thesis, convolutional neural networks are used to detect and understand changes using remote sensing images from various sources in supervised and weakly supervised settings. Siamese architectures are used to compare coregistered image pairs and to identify changed pixels. The proposed method is then extended into a multitask network architecture that is used to detect changes and perform land cover mapping simultaneously, which permits a semantic understanding of the detected changes. Then, classification filtering and a novel guided anisotropic diffusion algorithm are used to reduce the effect of biased label noise, which is a concern for automatically generated large-scale datasets. Weakly supervised learning is also achieved to perform pixel-level change detection using only image-level supervision through the usage of class activation maps and a novel spatial attention layer. Finally, a domain adaptation method based on adversarial training is proposed, which succeeds in projecting images from different domains into a common latent space where a given task can be performed. This method is tested not only for domain adaptation for change detection, but also for image classification and semantic segmentation, which proves its versatility

APA, Harvard, Vancouver, ISO, and other styles

18

Cherifa-Luron, Ményssa. "Prédiction des épisodes d'hypotension à partir de données longitudinales à haute fréquence recueillies auprès de patients en soins intensifs." Electronic Thesis or Diss., Université Paris Cité, 2021. https://wo.app.u-paris.fr/cgi-bin/WebObjects/TheseWeb.woa/wa/show?t=8076&f=67992.

Full text

Abstract:

La révolution numérique en santé, traduite à la fois par la centralisation et l'accès à de grandes bases de données médicales et par des avancées considérables de l'intelligence artificielle (IA), a permis de créer de nouvelles opportunités pour la Science des données appliquée à la médecine. Remettant le patient au cœur du système de soin, l’essor de ses nouvelles technologies garanti une médecine plus personnalisée capable d'identifier plus précocement des facteurs prédictifs et pronostics individuels. Ce travail de thèse s'inscrit dans le concept de la médecine personnalisée. Plus exactement, c'est un exemple de développement et d'application concret d’IA médicales pour la prédiction de l'hypotension et plus largement des états de choc (EC), pathologies fréquentes affectant plus d'un tiers des patients hospitalisés en réanimation. En effet, l'EC définit comme une défaillance du système circulatoire aboutissant à une inadéquation entre l'apport et les besoins tissulaires périphériques en oxygène, est considéré comme une urgence diagnostique et thérapeutique. Anticiper l’hypotension, un de ses principaux symptômes, peut être extrêmement utile pour prendre des décisions thérapeutiques mieux adaptées et dans certains cas prévenir, dès le départ, l'apparition d'une défaillance d'organe en ajustant de façon appropriée la stratégie thérapeutique. De plus, la capacité de prédire toute détérioration à venir peut être très utile pour aider l'affectation proactive des équipes soignantes au sein des services hospitaliers. La première partie de ce travail de thèse a porté sur l’utilisation et l'application d'un algorithme ensembliste issu de l'apprentissage automatique, le Super Learner (SL), pour prédire la survenue d'un épisode d'hypotension 10 minutes et plus à l’avance, chez des patients hospitalisés en réanimation. Ce travail a permis de démontrer que lorsqu'il s'agit de données massives, les signaux physiologiques peuvent être intégrés dans les modèles prédictifs sans nécessiter de méthodes complexes de prétraitement pour être exploités et que le SL était bien supérieur à chacun des algorithmes inclus dans sa bibliothèque, comme en témoignaient des erreurs plus faibles et de bonnes valeurs de sensibilité et spécificité lors de son évaluation interne et externe puis, dans la seconde partie de cette thèse, nous avons développé un modèle d'apprentissage automatique profond, le Physiological Deep Learner (PDL), pour prédire simultanément des valeurs de pression artérielle moyenne (PAM) et de fréquence cardiaque (FC) afin d’imiter la façon dont les cliniciens analysent conjointement l'évolution de la PAM et de la FC, étant donné leur étroite interdépendance physiologique. Nous avons mis en évidence que l'utilisation d’un algorithme multitâche surpassait les performances de prédiction d'algorithmes monotâches indépendants. En effet, par rapport à une approche plus traditionnelle, notre PDL présente un meilleur profil de calibration et moins d'erreur. En outre, le PDL a été capable de prédire avec une grande précision la survenue ou non d'un épisode d'hypotension jusqu'à 60 minutes à l'avance The digital revolution in healthcare, reflected in both the centralization of and access to extensive medical databases and the considerable advances in artificial intelligence (AI), has created new opportunities for data science applied to medicine. Putting the patient at the heart of the health care system, developing these new technologies guarantees a more personalized medicine by identifying more predictive factors and individual prognosis. This thesis work is entirely in line with the concept of personalized medicine. More precisely, it is an example of medical AI's development and concrete application to predict hypotension and, more broadly, of states of shock, frequent pathologies affecting more than one-third of patients hospitalized in intensive care. Indeed, shock, defined as a failure of the circulatory system leading to an inadequacy between the supply and the peripheral tissue needs in oxygen, is considered a diagnostic and therapeutic emergency. Therefore, anticipating hypotension, one of its main symptoms, can be extremely useful to make better therapeutic decisions and, in some cases, prevent the onset of organ failure from the beginning by appropriately adjusting the therapy. In addition, the ability to predict future deterioration can be beneficial to assist in the proactive assignment of care teams within hospital departments. The first part of this thesis work focused on using and applying a machine learning-based ensemble algorithm, the Super Learner (SL), to predict the occurrence of a hypotensive episode 10 minutes or more in advance in patients hospitalized in the ICU. This work demonstrated that physiological signals could be integrated into predictive models when dealing with massive data without requiring complex pre-processing methods to be exploited. Also, the SL was far superior to each of the algorithms included in its library, as evidenced by its lower errors and good values of sensitivity and specificity values during its internal and external evaluation. Then, to mimic the way that clinicians are trained to jointly analyze the evolution of mean arterial pressure (MAP) and heart rate (HR) given their close physiological interdependence, we developed a deep learning model, the Physiological Deep Learner (PDL), to predict MAP and HR simultaneously. We highlighted that the use of a multitasking algorithm outperformed the prediction performance of single-tasking algorithms. Indeed, compared to a more traditional approach, our PDL achieved better performance, exhibiting a better calibration profile and fewer errors. In addition, the PDL was able to predict with high accuracy the occurrence or non-occurrence of a hypotensive episode up to 60 minutes in advance

APA, Harvard, Vancouver, ISO, and other styles

19

Tafforeau, Jérémie. "Modèle joint pour le traitement automatique de la langue : perspectives au travers des réseaux de neurones." Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0430/document.

Full text

Abstract:

Les recherches en Traitement Automatique des Langues (TAL) ont identifié différents niveaux d'analyse lexicale, syntaxique et sémantique. Il en découle un découpage hiérarchique des différentes tâches à réaliser afin d'analyser un énoncé. Les systèmes classiques du TAL reposent sur des analyseurs indépendants disposés en cascade au sein de chaînes de traitement (pipelines). Cette approche présente un certain nombre de limitations : la dépendance des modèles à la sélection empirique des traits, le cumul des erreurs dans le pipeline et la sensibilité au changement de domaine. Ces limitations peuvent conduire à des pertes de performances particulièrement importantes lorsqu'il existe un décalage entre les conditions d'apprentissage des modèles et celles d'utilisation. Un tel décalage existe lors de l'analyse de transcriptions automatiques de parole spontanée comme par exemple les conversations téléphoniques enregistrées dans des centres d'appels. En effet l'analyse d'une langue non-canonique pour laquelle il existe peu de données d'apprentissage, la présence de disfluences et de constructions syntaxiques spécifiques à l'oral ainsi que la présence d'erreurs de reconnaissance dans les transcriptions automatiques mènent à une détérioration importante des performances des systèmes d'analyse. C'est dans ce cadre que se déroule cette thèse, en visant à mettre au point des systèmes d'analyse à la fois robustes et flexibles permettant de dépasser les limitations des systèmes actuels à l'aide de modèles issus de l'apprentissage par réseaux de neurones profonds NLP researchers has identified different levels of linguistic analysis. This lead to a hierarchical division of the various tasks performed in order to analyze a text statement. The traditional approach considers task-specific models which are subsequently arranged in cascade within processing chains (pipelines). This approach has a number of limitations: the empirical selection of models features, the errors accumulation in the pipeline and the lack of robusteness to domain changes. These limitations lead to particularly high performance losses in the case of non-canonical language with limited data available such as transcriptions of conversations over phone. Disfluencies and speech-specific syntactic schemes, as well as transcription errors in automatic speech recognition systems, lead to a significant drop of performances. It is therefore necessary to develop robust and flexible systems. We intend to perform a syntactic and semantic analysis using a deep neural network multitask model while taking into account the variations of domain and/or language registers within the data

APA, Harvard, Vancouver, ISO, and other styles

20

Donini, Michele. "Exploiting the structure of feature spaces in kernel learning." Doctoral thesis, Università degli studi di Padova, 2016. http://hdl.handle.net/11577/3424320.

Full text

Abstract:

The problem of learning the optimal representation for a specific task recently became an important and not trivial topic in the machine learning community. In this field, deep architectures are the current gold standard among the machine learning algorithms by generating models with several levels of abstraction discovering very complicated structures in large datasets. Kernels and Deep Neural Networks (DNNs) are the principal methods to handle the representation problem in a deep manner. A DNN uses the famous back-propagation algorithm improving the state-of-the-art performance in several different real world applications, e.g. speech recognition, object detection and signal processing. Nevertheless, DNN algorithms have some drawbacks, inherited from standard neural networks, since they are theoretically not well understood. The main problems are: the complex structure of the solution, the unclear decoupling between the representation learning phase and the model generation, long training time, and the convergence to a sub-optimal solution (because of local minima and vanishing gradient). For these reasons, in this thesis, we propose new ideas to obtain an optimal representation by exploiting the kernels theory. Kernel methods have an elegant framework that decouples learning algorithms from data representations. On the other hand, kernels also have some weaknesses, for example they do not scale and they generally bring a shallow representation. In this thesis, we propose new theory and algorithms to fill this gap and make kernel learning able to generate deeper representation and to be more scalable. Considering this scenario we propose a different point of view regarding the Multiple Kernel Learning (MKL) framework, starting from the idea of a deeper kernel. An algorithm able to combine thousands of weak kernels with low computational and memory complexities is proposed. This procedure, called EasyMKL, outperforms the state-of-the-art methods combining the fragmented information in order to create an optimal kernel for the given task. Pursuing the idea to create an optimal family of weak kernels, we create a new measure for the evaluation of the kernel expressiveness, called spectral complexity. Exploiting this measure we are able to generate families of kernels with a hierarchical structure of the features by defining a new property concerning the monotonicity of the spectral complexity. We prove the quality of these weak families of kernels developing a new methodology for the Multiple Kernel Learning (MKL). Firstly we are able to create an optimal family of weak kernels by using the monotonically spectral-complex property; then we combine the optimal family of kernels by exploiting EasyMKL, obtaining a new kernel that is specific for the task; finally, we are able to generate the model by using a kernel machine. Moreover, we highlight the connection among distance metric learning, feature learning and kernel learning by proposing a method to learn the optimal family of weak kernels for a MKL algorithm in the different context in which the combination rule is the product element-wise of kernel matrices. This algorithm is able to generate the best parameters for an anisotropic RBF kernel and, therefore, a connection naturally appears among feature weighting, combinations of kernels and metric learning. Finally, the importance of the representation is also taken into account in three tasks from real world problems where we tackle different issues such as noise data, real-time application and big data Il problema dell'apprendimento della reppresentazione ottima per un task specifico è divenuto un importante argomento nella comunità dell'apprendimento automatico. In questo campo, le architetture di tipo deep sono attualmente le più avanzate tra i possibili algoritmi di apprendimento automatico. Esse generano modelli che utilizzando alti gradi di astrazione e sono in grado di scoprire strutture complicate in dataset anche molto ampi. I kernel e le Deep Neural Network (DNN) sono i principali metodi per apprendere una rappresentazione di un problema in modo ricco (cioè deep). Le DNN sfruttano il famoso algoritmo di back-propagation migliorando le prestazioni degli algoritmi allo stato dell'arte in diverse applicazioni reali, come per esempio il riconoscimento vocale, il riconoscimento di oggetti o l'elaborazione di segnali. Tuttavia, gli algoritmi DNN hanno anche delle problematiche, ereditate dalle classiche reti neurali e derivanti dal fatto che esse non sono completamente comprese teoricamente. I problemi principali sono: la complessità della struttura della soluzione, la non chiara separazione tra la fase di apprendimento della rappresentazione ottimale e del modello, i lunghi tempi di training e la convergenza a soluzioni ottime solo localmente (a causa dei minimi locali e del vanishing gradient). Per questi motivi, in questa tesi, proponiamo nuove idee per ottenere rapprensetazioni ottimali sfruttando la teoria dei kernel. I metodi kernel hanno un elegante framework che separa l'algoritmo di apprendimento dalla rappresentazione delle informazioni. D'altro canto, anche i kernel hanno alcune debolezze, per esempio essi non scalano e, per come sono solitamente utilizzati, portano con loro una rappresentazione poco ricca (shallow). In questa tesi, proponiamo nuovi risultati teorici e nuovi algoritmi per cercare di risolvere questi problemi e rendere l'apprendimento dei kernel in grado di generare rappresentazioni più ricche (deeper) ed essere più scalabili. Verrà quindi presentato un nuovo algoritmo in grado di combinare migliaia di kernel deboli con un basso costo computazionale e di memoria. Questa procedura, chiamata EasyMKL, supera i metodi attualmente allo stato dell'arte combinando frammenti di informazione e creando in questo modo il kernel ottimale per uno specifico task. Perseguendo l'idea di creare una famiglia di kernel deboli ottimale, abbiamo creato una nuova misura di valutazione dell'espressività dei kernel, chiamata Spectral Complexity. Sfruttando questa misura siamo in grado di generare famiglia di kernel deboli con una struttura gerarchica nelle feature definendo una nuova proprietà riguardante la monotonicità della Spectral Complexity. Mostriamo la qualità dei nostri kernel deboli sviluppando una nuova metologia per il Multiple Kernel Learning (MKL). In primo luogo, siamo in grado di creare una famiglia ottimale di kernel deboli sfruttando la proprietà di monotinicità della Spectral Complexity; combiniamo quindi la famiglia di kernel deboli ottimale sfruttando EasyMKL e ottenendo un nuovo kernel, specifico per il singolo task; infine, siamo in grado di generare un modello sfruttando il nuovo kernel e kernel machine (per esempio una SVM). Inoltre, in questa tesi sottolineiamo le connessioni tra Distance Metric Learning, Feature Larning e Kernel Learning proponendo un metodo per apprendere la famiglia ottimale di kernel deboli per un algoritmo MKL in un contesto differente, in cui la regola di combinazione è il prodotto componente per componente delle matrici kernel. Questo algoritmo è in grado di generare i parametri ottimali per un kernel RBF anisotropico. Di conseguenza, si crea un naturale collegamento tra il Feature Weighting, le combinazioni dei kernel e l'apprendimento della metrica ottimale per il task. Infine, l'importanza della rappresentazione è anche presa in considerazione in tre task reali, dove affrontiamo differenti problematiche, tra cui: il rumore nei dati, le applicazioni in tempo reale e le grandi moli di dati (Big Data)

APA, Harvard, Vancouver, ISO, and other styles

21

Kovac, Krunoslav. "Multitask learning for Bayesian neural networks." 2005. http://link.library.utoronto.ca/eir/EIRdetail.cfm?Resources__ID=370149&T=F.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

TSAI, YAO-CHANG, and 蔡耀樟. "Applications of Multitask Learning to Human Activity Recognition." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/5h9828.

Full text

Abstract:

碩士 國立臺北科技大學 電機工程系 107 In previous studies, most of them treat human activity recognition as a single classification problem. They train a common classifier for all subjects without considering personal activity pattern. Since different person has his or her own activity pattern, a common classifier is not applicable for all subjects. Due to this problem, some researchers do the research on personalized human activity recognition, which train personalized model for each subject. However, with doing personalized human activity recognition comes a new problem which is data sparseness. Sometimes the amount of data is inadequate for training a good classifier. Although other researchers proposed a hybrid model which can train with current subject and other subject, there still has few methods that can enable model to train efficiently with two subjects. To solve this problem, this thesis proposed a method applying multitask learning to personalized human activity recognition. This method uses multitask neural network to train with two subject efficiently. It divides two subjects into main subject and auxiliary subject. The main purpose of this method is to improve the accuracy of main subject through training with auxiliary subject. With training with suitable auxiliary subject, Main subject can have at most 11.38 % of accuracy improvement.

APA, Harvard, Vancouver, ISO, and other styles

23

Koyejo, Oluwasanmi Oluseye. "Constrained relative entropy minimization with applications to multitask learning." 2013. http://hdl.handle.net/2152/20793.

Full text

Abstract:

This dissertation addresses probabilistic inference via relative entropy minimization subject to expectation constraints. A canonical representation of the solution is determined without the requirement for convexity of the constraint set, and is given by members of an exponential family. The use of conjugate priors for relative entropy minimization is proposed, and a class of conjugate prior distributions is introduced. An alternative representation of the solution is provided as members of the prior family when the prior distribution is conjugate. It is shown that the solutions can be found by direct optimization with respect to members of such parametric families. Constrained Bayesian inference is recovered as a special case with a specific choice of constraints induced by observed data. The framework is applied to the development of novel probabilistic models for multitask learning subject to constraints determined by domain expertise. First, a model is developed for multitask learning that jointly learns a low rank weight matrix and the prior covariance structure between different tasks. The multitask learning approach is extended to a class of nonparametric statistical models for transposable data, incorporating side information such as graphs that describe inter-row and inter-column similarity. The resulting model combines a matrix-variate Gaussian process prior with inference subject to nuclear norm expectation constraints. In addition, a novel nonparametric model is proposed for multitask bipartite ranking. The proposed model combines a hierarchical matrix-variate Gaussian process prior with inference subject to ordering constraints and nuclear norm constraints, and is applied to disease gene prioritization. In many of these applications, the solution is found to be unique. Experimental results show substantial performance improvements as compared to strong baseline models. text

APA, Harvard, Vancouver, ISO, and other styles

24

Martí, i. Rabadán Miquel. "Multitask Deep Learning models for real-time deployment in embedded systems." Thesis, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-216673.

Full text

Abstract:

Multitask Learning (MTL) was conceived as an approach to improve thegeneralization ability of machine learning models. When applied to neu-ral networks, multitask models take advantage of sharing resources forreducing the total inference time, memory footprint and model size. Wepropose MTL as a way to speed up deep learning models for applicationsin which multiple tasks need to be solved simultaneously, which is par-ticularly useful in embedded, real-time systems such as the ones foundin autonomous cars or UAVs.In order to study this approach, we apply MTL to a Computer Vi-sion problem in which both Object Detection and Semantic Segmenta-tion tasks are solved based on the Single Shot Multibox Detector andFully Convolutional Networks with skip connections respectively, usinga ResNet-50 as the base network. We train multitask models for twodifferent datasets, Pascal VOC, which is used to validate the decisionsmade, and a combination of datasets with aerial view images capturedfrom UAVs.Finally, we analyse the challenges that appear during the process of train-ing multitask networks and try to overcome them. However, these hinderthe capacity of our multitask models to reach the performance of the bestsingle-task models trained without the limitations imposed by applyingMTL. Nevertheless, multitask networks benefit from sharing resourcesand are 1.6x faster, lighter and use less memory compared to deployingthe single-task models in parallel, which turns essential when runningthem on a Jetson TX1 SoC as the parallel approach does not fit intomemory. We conclude that MTL has the potential to give superior per-formance as far as the object detection and semantic segmentation tasksare concerned in exchange of a more complex training process that re-quires overcoming challenges not present in the training of single-taskmodels.

APA, Harvard, Vancouver, ISO, and other styles

25

Martí, Rabadán Miquel. "Multitask Deep Learning models for real-time deployment in embedded systems." Thesis, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-216673.

Full text

Abstract:

Multitask Learning (MTL) was conceived as an approach to improve thegeneralization ability of machine learning models. When applied to neu-ral networks, multitask models take advantage of sharing resources forreducing the total inference time, memory footprint and model size. Wepropose MTL as a way to speed up deep learning models for applicationsin which multiple tasks need to be solved simultaneously, which is par-ticularly useful in embedded, real-time systems such as the ones foundin autonomous cars or UAVs.In order to study this approach, we apply MTL to a Computer Vi-sion problem in which both Object Detection and Semantic Segmenta-tion tasks are solved based on the Single Shot Multibox Detector andFully Convolutional Networks with skip connections respectively, usinga ResNet-50 as the base network. We train multitask models for twodifferent datasets, Pascal VOC, which is used to validate the decisionsmade, and a combination of datasets with aerial view images capturedfrom UAVs.Finally, we analyse the challenges that appear during the process of train-ing multitask networks and try to overcome them. However, these hinderthe capacity of our multitask models to reach the performance of the bestsingle-task models trained without the limitations imposed by applyingMTL. Nevertheless, multitask networks benefit from sharing resourcesand are 1.6x faster, lighter and use less memory compared to deployingthe single-task models in parallel, which turns essential when runningthem on a Jetson TX1 SoC as the parallel approach does not fit intomemory. We conclude that MTL has the potential to give superior per-formance as far as the object detection and semantic segmentation tasksare concerned in exchange of a more complex training process that re-quires overcoming challenges not present in the training of single-taskmodels.

APA, Harvard, Vancouver, ISO, and other styles

26

Hsieh, Yeu-Chen, and 謝雨辰. "Forecasting Solar Power Production by Heterogeneous Data Streams and Multitask Learning." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/373zyg.

Full text

Abstract:

碩士 國立臺灣大學 資訊工程學研究所 105 In recent years, solar energy has become a significant field of research across the globe because of the growing demand for renewable energy and its promising potential in sustainability aspects. Therefore, with the increasing integration of photovoltaic (PV) systems to the electrical grid, reliable prediction of the expected production output of PV systems is gaining importance as a basis for management and operation strategies. However, power production of PV systems is highly variable due to its dependence on solar radiance, meteorological conditions and other external factors. Currently, most studies are unable to predict the solar power production at multiple future time points and never analyze the influence of the possible factors on the solar power production but simply feed all features without considering their properties. Therefore, this paper provides a holistic comparison among all factors (e.g., solar radiance, meteorology) affecting the solar power production and also, a multimodal and end-to-end neural networks model is proposed to simultaneously predict the solar power production at multiple future time points by multitask learning. Finally, with multitask learning on heterogeneous (and multimodal) data, the proposed method achieves the lowest error rates (11.83\% for 5-minute prediction) compared to the state of the art.

APA, Harvard, Vancouver, ISO, and other styles

27

"Novel Deep Learning Models for Medical Imaging Analysis." Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.55510.

Full text

Abstract:

abstract: Deep learning is a sub-field of machine learning in which models are developed to imitate the workings of the human brain in processing data and creating patterns for decision making. This dissertation is focused on developing deep learning models for medical imaging analysis of different modalities for different tasks including detection, segmentation and classification. Imaging modalities including digital mammography (DM), magnetic resonance imaging (MRI), positron emission tomography (PET) and computed tomography (CT) are studied in the dissertation for various medical applications. The first phase of the research is to develop a novel shallow-deep convolutional neural network (SD-CNN) model for improved breast cancer diagnosis. This model takes one type of medical image as input and synthesizes different modalities for additional feature sources; both original image and synthetic image are used for feature generation. This proposed architecture is validated in the application of breast cancer diagnosis and proved to be outperforming the competing models. Motivated by the success from the first phase, the second phase focuses on improving medical imaging synthesis performance with advanced deep learning architecture. A new architecture named deep residual inception encoder-decoder network (RIED-Net) is proposed. RIED-Net has the advantages of preserving pixel-level information and cross-modality feature transferring. The applicability of RIED-Net is validated in breast cancer diagnosis and Alzheimer’s disease (AD) staging. Recognizing medical imaging research often has multiples inter-related tasks, namely, detection, segmentation and classification, my third phase of the research is to develop a multi-task deep learning model. Specifically, a feature transfer enabled multi-task deep learning model (FT-MTL-Net) is proposed to transfer high-resolution features from segmentation task to low-resolution feature-based classification task. The application of FT-MTL-Net on breast cancer detection, segmentation and classification using DM images is studied. As a continuing effort on exploring the transfer learning in deep models for medical application, the last phase is to develop a deep learning model for both feature transfer and knowledge from pre-training age prediction task to new domain of Mild cognitive impairment (MCI) to AD conversion prediction task. It is validated in the application of predicting MCI patients’ conversion to AD with 3D MRI images. Dissertation/Thesis Doctoral Dissertation Industrial Engineering 2019

APA, Harvard, Vancouver, ISO, and other styles

28

Reid, Mark Darren Computer Science &amp Engineering Faculty of Engineering UNSW. "DEFT guessing: using inductive transfer to improve rule evaluation from limited data." 2007. http://handle.unsw.edu.au/1959.4/40513.

Full text

Abstract:

Algorithms that learn sets of rules describing a concept from its examples have been widely studied in machine learning and have been applied to problems in medicine, molecular biology, planning and linguistics. Many of these algorithms used a separate-and-conquer strategy, repeatedly searching for rules that explain different parts of the example set. When examples are scarce, however, it is difficult for these algorithms to evaluate the relative quality of two or more rules which fit the examples equally well. This dissertation proposes, implements and examines a general technique for modifying rule evaluation in order to improve learning performance in these situations. This approach, called Description-based Evaluation Function Transfer (DEFT), adjusts the way rules are evaluated on a target concept by taking into account the performance of similar rules on a related support task that is supplied by a domain expert. Central to this approach is a novel theory of task similarity that is defined in terms of syntactic properties of rules, called descriptions, which define what it means for rules to be similar. Each description is associated with a prior distribution over classification probabilities derived from the support examples and a rule's evaluation on a target task is combined with the relevant prior using Bayes' rule. Given some natural conditions regarding the similarity of the target and support task, it is shown that modifying rule evaluation in this way is guaranteed to improve estimates of the true classification probabilities. Algorithms to efficiently implement Deft are described, analysed and used to measure the effect these improvements have on the quality of induced theories. Empirical studies of this implementation were carried out on two artificial and two real-world domains. The results show that the inductive transfer of evaluation bias based on rule similarity is an effective and practical way to improve learning when training examples are limited.

APA, Harvard, Vancouver, ISO, and other styles

29

Hwang, Sung Ju. "Discriminative object categorization with external semantic knowledge." 2013. http://hdl.handle.net/2152/21320.

Full text

Abstract:

Visual object category recognition is one of the most challenging problems in computer vision. Even assuming that we can obtain a near-perfect instance level representation with the advances in visual input devices and low-level vision techniques, object categorization still remains as a difficult problem because it requires drawing boundaries between instances in a continuous world, where the boundaries are solely defined by human conceptualization. Object categorization is essentially a perceptual process that takes place in a human-defined semantic space. In this semantic space, the categories reside not in isolation, but in relation to others. Some categories are similar, grouped, or co-occur, and some are not. However, despite this semantic nature of object categorization, most of the today's automatic visual category recognition systems rely only on the category labels for training discriminative recognition with statistical machine learning techniques. In many cases, this could result in the recognition model being misled into learning incorrect associations between visual features and the semantic labels, from essentially overfitting to training set biases. This limits the model's prediction power when new test instances are given. Using semantic knowledge has great potential to benefit object category recognition. First, semantic knowledge could guide the training model to learn a correct association between visual features and the categories. Second, semantics provide much richer information beyond the membership information given by the labels, in the form of inter-category and category-attribute distances, relations, and structures. Finally, the semantic knowledge scales well as the relations between categories become larger with an increasing number of categories. My goal in this thesis is to learn discriminative models for categorization that leverage semantic knowledge for object recognition, with a special focus on the semantic relationships among different categories and concepts. To this end, I explore three semantic sources, namely attributes, taxonomies, and analogies, and I show how to incorporate them into the original discriminative model as a form of structural regularization. In particular, for each form of semantic knowledge I present a feature learning approach that defines a semantic embedding to support the object categorization task. The regularization penalizes the models that deviate from the known structures according to the semantic knowledge provided. The first semantic source I explore is attributes, which are human-describable semantic characteristics of an instance. While the existing work treated them as mid-level features which did not introduce new information, I focus on their potential as a means to better guide the learning of object categories, by enforcing the object category classifiers to share features with attribute classifiers, in a multitask feature learning framework. This approach essentially discovers the common low-dimensional features that support predictions in both semantic spaces. Then, I move on to the semantic taxonomy, which is another valuable source of semantic knowledge. The merging and splitting criteria for the categories on a taxonomy are human-defined, and I aim to exploit this implicit semantic knowledge. Specifically, I propose a tree of metrics (ToM) that learns metrics that capture granularity-specific similarities at different nodes of a given semantic taxonomy, and uses a regularizer to isolate granularity-specific disjoint features. This approach captures the intuition that the features used for the discrimination of the parent class should be different from the features used for the children classes. Such learned metrics can be used for hierarchical classification. The use of a single taxonomy can be limited in that its structure is not optimal for hierarchical classification, and there may exist no single optimal semantic taxonomy that perfectly aligns with visual distributions. Thus, I next propose a way to overcome this limitation by leveraging multiple taxonomies as semantic sources to exploit, and combine the acquired complementary information across multiple semantic views and granularities. This allows us, for example, to synthesize semantics from both 'Biological', and 'Appearance'-based taxonomies when learning the visual features. Finally, as a further exploration of more complex semantic relations different from the previous two pairwise similarity-based models, I exploit analogies, which encode the relational similarities between two related pairs of categories. Specifically, I use analogies to regularize a discriminatively learned semantic embedding space for categorization, such that the displacements between the two category embeddings in both category pairs of the analogy are enforced to be the same. Such a constraint allows for a more confusing pair of categories to benefit from a clear separation in the matched pair of categories that share the same relation. All of these methods are evaluated on challenging public datasets, and are shown to effectively improve the recognition accuracy over purely discriminative models, while also guiding the recognition to be more semantic to human perception. Further, the applications of the proposed methods are not limited to visual object categorization in computer vision, but they can be applied to any classification problems where there exists some domain knowledge about the relationships or structures between the classes. Possible applications of my methods outside the visual recognition domain include document classification in natural language processing, and gene-based animal or protein classification in computational biology. text

APA, Harvard, Vancouver, ISO, and other styles

30

Yaghoubi, Ehsan. "Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments." Doctoral thesis, 2021. http://hdl.handle.net/10400.6/12081.

Full text

Abstract:

Traditionally, recognition systems were only based on human hard biometrics. However, the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from far distances, without people attendance in the acquisition process. Highresolution face closeshots are rarely available at far distances such that facebased systems cannot provide reliable results in surveillance applications. Human soft biometrics such as body and clothing attributes are believed to be more effective in analyzing human data collected by security cameras. This thesis contributes to the human soft biometric analysis in uncontrolled environments and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification (reid). We first review the literature of both tasks and highlight the history of advancements, recent developments, and the existing benchmarks. PAR and person reid difficulties are due to significant distances between intraclass samples, which originate from variations in several factors such as body pose, illumination, background, occlusion, and data resolution. Recent stateoftheart approaches present endtoend models that can extract discriminative and comprehensive feature representations from people. The correlation between different regions of the body and dealing with limited learning data is also the objective of many recent works. Moreover, class imbalance and correlation between human attributes are specific challenges associated with the PAR problem. We collect a large surveillance dataset to train a novel gender recognition model suitable for uncontrolled environments. We propose a deep residual network that extracts several posewise patches from samples and obtains a comprehensive feature representation. In the next step, we develop a model for multiple attribute recognition at once. Considering the correlation between human semantic attributes and class imbalance, we respectively use a multitask model and a weighted loss function. We also propose a multiplication layer on top of the backbone features extraction layers to exclude the background features from the final representation of samples and draw the attention of the model to the foreground area. We address the problem of person reid by implicitly defining the receptive fields of deep learning classification frameworks. The receptive fields of deep learning models determine the most significant regions of the input data for providing correct decisions. Therefore, we synthesize a set of learning data in which the destructive regions (e.g., background) in each pair of instances are interchanged. A segmentation module determines destructive and useful regions in each sample, and the label of synthesized instances are inherited from the sample that shared the useful regions in the synthesized image. The synthesized learning data are then used in the learning phase and help the model rapidly learn that the identity and background regions are not correlated. Meanwhile, the proposed solution could be seen as a data augmentation approach that fully preserves the label information and is compatible with other data augmentation techniques. When reid methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most importance in the final feature representation. Clothbased representations are not reliable in the longterm reid settings as people may change their clothes. Therefore, developing solutions that ignore clothing cues and focus on identityrelevant features are in demand. We transform the original data such that the identityrelevant information of people (e.g., face and body shape) are removed, while the identityunrelated cues (i.e., color and texture of clothes) remain unchanged. A learned model on the synthesized dataset predicts the identityunrelated cues (shortterm features). Therefore, we train a second model coupled with the first model and learns the embeddings of the original data such that the similarity between the embeddings of the original and synthesized data is minimized. This way, the second model predicts based on the identityrelated (longterm) representation of people. To evaluate the performance of the proposed models, we use PAR and person reid datasets, namely BIODI, PETA, RAP, Market1501, MSMTV2, PRCC, LTCC, and MIT and compared our experimental results with stateoftheart methods in the field. In conclusion, the data collected from surveillance cameras have low resolution, such that the extraction of hard biometric features is not possible, and facebased approaches produce poor results. In contrast, soft biometrics are robust to variations in data quality. So, we propose approaches both for PAR and person reid to learn discriminative features from each instance and evaluate our proposed solutions on several publicly available benchmarks. This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Multitask learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles