Dissertations / Theses: 'Machine Learning, Artificial Intelligence, Regularization Methods'

1

ROSSI, ALESSANDRO. "Regularization and Learning in the temporal domain." Doctoral thesis, Università di Siena, 2017. http://hdl.handle.net/11365/1006818.

Full text

Abstract:

The main proposals of this thesis concern the formulation of a new approach to classic Machine Learning optimization procedures, so as to inspire some insights about the recent rush to gold in most of the Artificial Intelligence applications. Crucial aspects we would like to raise up can be associated to the current thirst of data that characterizes popular approaches, above all in Deep Learning. The high representational power of these kind of structures allows us to achieve state of the art results in most of the existent Machine Learning benchmarks, provided that a lot of labeled data are available in order to tune the large number of learnable parameters. All the information about the analyzed phenomena is assumed to be available at the beginning of the training and, usually, provided to the agent in a shuffled order to improve the final result. However, when looking to the fundamental ambition of Artificial Intelligence to simulate human behavior, it is straightforward to note that these problems have been embedded in a framework which is completely different from the biological environment. Our formulation, inspired by natural behaviors in biology, is related to general Regularization Methods for Machine Learning. As for other natural phenomena, we tried to provide a model described by Laws of Physics, taking into account the existent forces and the final goal of the problem. This approach allows us to recreate, analogously to the studies on systems of particles considered in Classical Mechanics, laws of motion that implement a regularization effect based on the temporal smoothness of the environment. The results came up to be a generalization of classical algorithms for discrete optimization, empirically reinforcing the soundness of the theory. The main contributions of the thesis regard an extensive experimental analysis necessary to validate the proposed model and analyze the effects of its hyper-parameters. Furthermore, we propose a novel learning structure to extend classic semi-supervised learning techniques to on-line learning and include general kind of constraints.

APA, Harvard, Vancouver, ISO, and other styles

2

Lu, Yibiao. "Statistical methods with application to machine learning and artificial intelligence." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44730.

Full text

Abstract:

This thesis consists of four chapters. Chapter 1 focuses on theoretical results on high-order laplacian-based regularization in function estimation. We studied the iterated laplacian regularization in the context of supervised learning in order to achieve both nice theoretical properties (like thin-plate splines) and good performance over complex region (like soap film smoother). In Chapter 2, we propose an innovative static path-planning algorithm called m-A* within an environment full of obstacles. Theoretically we show that m-A* reduces the number of vertex. In the simulation study, our approach outperforms A* armed with standard L1 heuristic and stronger ones such as True-Distance heuristics (TDH), yielding faster query time, adequate usage of memory and reasonable preprocessing time. Chapter 3 proposes m-LPA* algorithm which extends the m-A* algorithm in the context of dynamic path-planning and achieves better performance compared to the benchmark: lifelong planning A* (LPA*) in terms of robustness and worst-case computational complexity. Employing the same beamlet graphical structure as m-A*, m-LPA* encodes the information of the environment in a hierarchical, multiscale fashion, and therefore it produces a more robust dynamic path-planning algorithm. Chapter 4 focuses on an approach for the prediction of spot electricity spikes via a combination of boosting and wavelet analysis. Extensive numerical experiments show that our approach improved the prediction accuracy compared to those results of support vector machine, thanks to the fact that the gradient boosting trees method inherits the good properties of decision trees such as robustness to the irrelevant covariates, fast computational capability and good interpretation.

APA, Harvard, Vancouver, ISO, and other styles

3

Giuliani, Luca. "Extending the Moving Targets Method for Injecting Constraints in Machine Learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23885/.

Full text

Abstract:

Informed Machine Learning is an umbrella term that comprises a set of methodologies in which domain knowledge is injected into a data-driven system in order to improve its level of accuracy, satisfy some external constraint, and in general serve the purposes of explainability and reliability. The said topid has been widely explored in the literature by means of many different techniques. Moving Targets is one such a technique particularly focused on constraint satisfaction: it is based on decomposition and bi-level optimization and proceeds by iteratively refining the target labels through a master step which is in charge of enforcing the constraints, while the training phase is delegated to a learner. In this work, we extend the algorithm in order to deal with semi-supervised learning and soft constraints. In particular, we focus our empirical evaluation on both regression and classification tasks involving monotonicity shape constraints. We demonstrate that our method is robust with respect to its hyperparameters, as well as being able to generalize very well while reducing the number of violations on the enforced constraints. Additionally, the method can even outperform, both in terms of accuracy and constraint satisfaction, other state-of-the-art techniques such as Lattice Models and Semantic-based Regularization with a Lagrangian Dual approach for automatic hyperparameter tuning.

APA, Harvard, Vancouver, ISO, and other styles

4

Le, Truc Duc. "Machine Learning Methods for 3D Object Classification and Segmentation." Thesis, University of Missouri - Columbia, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13877153.

Full text

Abstract:

Object understanding is a fundamental problem in computer vision and it has been extensively researched in recent years thanks to the availability of powerful GPUs and labelled data, especially in the context of images. However, 3D object understanding is still not on par with its 2D domain and deep learning for 3D has not been fully explored yet. In this dissertation, I work on two approaches, both of which advances the state-of-the-art results in 3D classification and segmentation.

The first approach, called MVRNN, is based multi-view paradigm. In contrast to MVCNN which does not generate consistent result across different views, by treating the multi-view images as a temporal sequence, our MVRNN correlates the features and generates coherent segmentation across different views. MVRNN demonstrated state-of-the-art performance on the Princeton Segmentation Benchmark dataset.

The second approach, called PointGrid, is a hybrid method which combines points and regular grid structure. 3D points can retain fine details but irregular, which is challenge for deep learning methods. Volumetric grid is simple and has regular structure, but does not scale well with data resolution. Our PointGrid, which is simple, allows the fine details to be consumed by normal convolutions under a coarser resolution grid. PointGrid achieved state-of-the-art performance on ModelNet40 and ShapeNet datasets in 3D classification and object part segmentation.

APA, Harvard, Vancouver, ISO, and other styles

5

Michael, Christoph Cornelius. "General methods for analyzing machine learning sample complexity." W&M ScholarWorks, 1994. https://scholarworks.wm.edu/etd/1539623860.

Full text

Abstract:

During the past decade, there has been a resurgence of interest in applying mathematical methods to problems in artificial intelligence. Much work has been done in the field of machine learning, but it is not always clear how the results of this research should be applied to practical problems. Our aim is to help bridge the gap between theory and practice by addressing the question: "If we are given a machine learning algorithm, how should we go about formally analyzing it?" as opposed to the usual question: "how do we write a learning algorithm we can analyze?".;We will consider algorithms that accept randomly drawn training data as input, and produce classification rules as their outputs. For the most part our analyses will be based on the syntactic structure of these classification rules; for example, if we know that the algorithm we want to analyze will only output logical expressions that are conjunctions of variables, we can use this fact to facilitate our analysis.;We use a probabilistic framework for machine learning, often called the pac model. In this framework, one asks whether or not a machine learning algorithm has a high probability of generating classification rules that "usually" make the right classification (pac means probably approximately correct). Research in the pac framework can be divided into two subfields. The first field is concerned with the amount of training data that is needed for successful learning to take place (success being defined in terms of generalization ability); the second field is concerned with the computational complexity of learning once the training data have been selected. Since most existing algorithms use heuristics to deal with the problem of complexity, we are primarily concerned with the amount of training data that algorithms require.

APA, Harvard, Vancouver, ISO, and other styles

6

Gao, Xi. "Graph-based Regularization in Machine Learning: Discovering Driver Modules in Biological Networks." VCU Scholars Compass, 2015. http://scholarscompass.vcu.edu/etd/3942.

Full text

Abstract:

Curiosity of human nature drives us to explore the origins of what makes each of us different. From ancient legends and mythology, Mendel's law, Punnett square to modern genetic research, we carry on this old but eternal question. Thanks to technological revolution, today's scientists try to answer this question using easily measurable gene expression and other profiling data. However, the exploration can easily get lost in the data of growing volume, dimension, noise and complexity. This dissertation is aimed at developing new machine learning methods that take data from different classes as input, augment them with knowledge of feature relationships, and train classification models that serve two goals: 1) class prediction for previously unseen samples; 2) knowledge discovery of the underlying causes of class differences. Application of our methods in genetic studies can help scientist take advantage of existing biological networks, generate diagnosis with higher accuracy, and discover the driver networks behind the differences. We proposed three new graph-based regularization algorithms. Graph Connectivity Constrained AdaBoost algorithm combines a connectivity module, a deletion function, and a model retraining procedure with the AdaBoost classifier. Graph-regularized Linear Programming Support Vector Machine integrates penalty term based on submodular graph cut function into linear classifier's objective function. Proximal Graph LogisticBoost adds lasso and graph-based penalties into logistic risk function of an ensemble classifier. Results of tests of our models on simulated biological datasets show that the proposed methods are able to produce accurate, sparse classifiers, and can help discover true genetic differences between phenotypes.

APA, Harvard, Vancouver, ISO, and other styles

7

Puthiya, Parambath Shameem Ahamed. "New methods for multi-objective learning." Thesis, Compiègne, 2016. http://www.theses.fr/2016COMP2322/document.

Full text

Abstract:

Les problèmes multi-objectifs se posent dans plusieurs scénarios réels dans le monde où on doit trouver une solution optimale qui soit un compromis entre les différents objectifs en compétition. Dans cette thèse, on étudie et on propose des algorithmes pour traiter les problèmes des machines d’apprentissage multi-objectif. On étudie deux méthodes d’apprentissage multi-objectif en détail. Dans la première méthode, on étudie le problème de trouver le classifieur optimal pour réaliser des mesures de performances multivariées. Dans la seconde méthode, on étudie le problème de classer des informations diverses dans les missions de recherche des informations
Multi-objective problems arise in many real world scenarios where one has to find an optimal solution considering the trade-off between different competing objectives. Typical examples of multi-objective problems arise in classification, information retrieval, dictionary learning, online learning etc. In this thesis, we study and propose algorithms for multi-objective machine learning problems. We give many interesting examples of multi-objective learning problems which are actively persuaded by the research community to motivate our work. Majority of the state of the art algorithms proposed for multi-objective learning comes under what is called “scalarization method”, an efficient algorithm for solving multi-objective optimization problems. Having motivated our work, we study two multi-objective learning tasks in detail. In the first task, we study the problem of finding the optimal classifier for multivariate performance measures. The problem is studied very actively and recent papers have proposed many algorithms in different classification settings. We study the problem as finding an optimal trade-off between different classification errors, and propose an algorithm based on cost-sensitive classification. In the second task, we study the problem of diverse ranking in information retrieval tasks, in particular recommender systems. We propose an algorithm for diverse ranking making use of the domain specific information, and formulating the problem as a submodular maximization problem for coverage maximization in a weighted similarity graph. Finally, we conclude that scalarization based algorithms works well for multi-objective learning problems. But when considering algorithms for multi-objective learning problems, scalarization need not be the “to go” approach. It is very important to consider the domain specific information and objective functions. We end this thesis by proposing some of the immediate future work, which are currently being experimented, and some of the short term future work which we plan to carry out

APA, Harvard, Vancouver, ISO, and other styles

8

He, Yuesheng. "The intelligent behavior of 3D graphical avatars based on machine learning methods." HKBU Institutional Repository, 2012. https://repository.hkbu.edu.hk/etd_ra/1404.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Sirin, Volkan. "Machine Learning Methods For Opponent Modeling In Games Of Imperfect Information." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614630/index.pdf.

Full text

Abstract:

This thesis presents a machine learning approach to the problem of opponent modeling in games of imperfect information. The efficiency of various artificial intelligence techniques are investigated in this domain. A sequential game is called imperfect information game if players do not have all the information about the current state of the game. A very popular example is the Texas Holdem Poker, which is used for realization of the suggested methods in this thesis. Opponent modeling is the system that enables a player to predict the behaviour of its opponent. In this study, opponent modeling problem is approached as a classification problem. An architecture with different classifiers for each phase of the game is suggested. Neural Networks, K-Nearest Neighbors (KNN) and Support Vector Machines are used as classifier. For modeling a particular player, KNN is found to be most successful amongst all, with a prediction accuracy of 88%. An ensemble learning system is proposed for modeling different playing styles and unknown ones. Computational complexity and parallelization of some calculations are also provided.

APA, Harvard, Vancouver, ISO, and other styles

10

Wallis, David. "A study of machine learning and deep learning methods and their application to medical imaging." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPAST057.

Full text

Abstract:

Nous utilisons d'abord des réseaux neuronaux convolutifs (CNNs) pour automatiser la détection des ganglions lymphatiques médiastinaux dans les images TEP/TDM. Nous construisons un modèle entièrement automatisé pour passer directement des images TEP/TDM à la localisation des ganglions. Les résultats montrent une performance comparable à celle d'un médecin. Dans la seconde partie de la thèse, nous testons la performance, l'interprétabilité et la stabilité des modèles radiomiques et CNN sur trois ensembles de données (IRM cérébrale 2D, TDM pulmonaire 3D, TEP/TDM médiastinale 3D). Nous comparons la façon dont les modèles s'améliorent lorsque davantage de données sont disponibles et nous examinons s'il existe des tendances communess aux différents problèmes. Nous nous demandons si les méthodes actuelles d'interprétation des modèles sont satisfaisantes. Nous étudions également comment une segmentation précise affecte les performances des modèles. Nous utilisons d'abord des réseaux neuronaux convolutifs (CNNs) pour automatiser la détection des ganglions lymphatiques médiastinaux dans les images TEP/TDM. Nous construisons un modèle entièrement automatisé pour passer directement des images TEP/TDM à la localisation des ganglions. Les résultats montrent une performance comparable à celle d'un médecin. Dans la seconde partie de la thèse, nous testons la performance, l'interprétabilité et la stabilité des modèles radiomiques et CNN sur trois ensembles de données (IRM cérébrale 2D, TDM pulmonaire 3D, TEP/TDM médiastinale 3D). Nous comparons la façon dont les modèles s'améliorent lorsque davantage de données sont disponibles et nous examinons s'il existe des tendances communess aux différents problèmes. Nous nous demandons si les méthodes actuelles d'interprétation des modèles sont satisfaisantes. Nous étudions également comment une segmentation précise affecte les performances des modèles
We first use Convolutional Neural Networks (CNNs) to automate mediastinal lymph node detection using FDG-PET/CT scans. We build a fully automated model to go directly from whole-body FDG-PET/CT scans to node localisation. The results show a comparable performance to an experienced physician. In the second half of the thesis we experimentally test the performance, interpretability, and stability of radiomic and CNN models on three datasets (2D brain MRI scans, 3D CT lung scans, 3D FDG-PET/CT mediastinal scans). We compare how the models improve as more data is available and examine whether there are patterns common to the different problems. We question whether current methods for model interpretation are satisfactory. We also investigate how precise segmentation affects the performance of the models. We first use Convolutional Neural Networks (CNNs) to automate mediastinal lymph node detection using FDG-PET/CT scans. We build a fully automated model to go directly from whole-body FDG-PET/CT scans to node localisation. The results show a comparable performance to an experienced physician. In the second half of the thesis we experimentally test the performance, interpretability, and stability of radiomic and CNN models on three datasets (2D brain MRI scans, 3D CT lung scans, 3D FDG-PET/CT mediastinal scans). We compare how the models improve as more data is available and examine whether there are patterns common to the different problems. We question whether current methods for model interpretation are satisfactory. We also investigate how precise segmentation affects the performance of the models

APA, Harvard, Vancouver, ISO, and other styles

11

Quintal, Kyle. "Context-Awareness for Adversarial and Defensive Machine Learning Methods in Cybersecurity." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/40835.

Full text

Abstract:

Machine Learning has shown great promise when combined with large volumes of historical data and produces great results when combined with contextual properties. In the world of the Internet of Things, the extraction of information regarding context, or contextual information, is increasingly prominent with scientific advances. Combining such advancements with artificial intelligence is one of the themes in this thesis. Particularly, there are two major areas of interest: context-aware attacker modelling and context-aware defensive methods. Both areas use authentication methods to either infiltrate or protect digital systems. After a brief introduction in chapter 1, chapter 2 discusses the current extracted contextual information within cybersecurity studies, and how machine learning accomplishes a variety of cybersecurity goals. Chapter 3 introduces an attacker injection model, championing the adversarial methods. Then, chapter 4 extracts contextual data and provides an intelligent machine learning technique to mitigate anomalous behaviours. Chapter 5 explores the feasibility of adopting a similar defensive methodology in the cyber-physical domain, and future directions are presented in chapter 6. Particularly, we begin this thesis by explaining the need for further improvements in cybersecurity using contextual information and discuss its feasibility, now that ubiquitous sensors exist in our everyday lives. These sensors often show a high correlation with user identity in surprising combinations. Our first contribution lay within the domain of Mobile CrowdSensing (MCS). Despite its benefits, MCS requires proper security solutions to prevent various attacks, notably injection attacks. Our smart-injection model, SINAM, monitors data traffic in an online-learning manner, simulating an injection model with undetection rates of 99%. SINAM leverages contextual similarities within a given sensing campaign to mimic anomalous injections. On the flip-side, we investigate how contextual features can be utilized to improve authentication methods in an enterprise context. Also motivated by the emergence of omnipresent mobile devices, we expand the Spatio-temporal features of unfolding contexts by introducing three contextual metrics: document shareability, document valuation, and user cooperation. These metrics are vetted against modern machine learning techniques and achieved an average of 87% successful authentication attempts. Our third contribution aims to further improve such results but introducing a Smart Enterprise Access Control (SEAC) technique. Combining the new contextual metrics with SEAC achieved an authenticity precision of 99% and a recall of 97%. Finally, the last contribution is an introductory study on risk analysis and mitigation using context. Here, cyber-physical coupling metrics are created to extract a precise representation of unfolding contexts in the medical field. The presented consensus algorithm achieves initial system conveniences and security ratings of 88% and 97% with these news metrics. Even as a feasibility study, physical context extraction shows good promise in improving cybersecurity decisions. In short, machine learning is a powerful tool when coupled with contextual data and is applicable across many industries. Our contributions show how the engineering of contextual features, adversarial and defensive methods can produce applicable solutions in cybersecurity, despite minor shortcomings.

APA, Harvard, Vancouver, ISO, and other styles

12

Lu, Yang. "Advances in imbalanced data learning." HKBU Institutional Repository, 2019. https://repository.hkbu.edu.hk/etd_oa/657.

Full text

Abstract:

With the increasing availability of large amount of data in a wide range of applications, no matter for industry or academia, it becomes crucial to understand the nature of complex raw data, in order to gain more values from data engineering. Although many problems have been successfully solved by some mature machine learning techniques, the problem of learning from imbalanced data continues to be one of the challenges in the field of data engineering and machine learning, which attracted growing attention in recent years due to its complexity. In this thesis, we focus on four aspects of imbalanced data learning and propose solutions to the key problems. The first aspect is about ensemble methods for imbalanced data classification. Ensemble methods, e.g. bagging and boosting, have the advantages to cure imbalanced data by integrated with sampling methods. However, there are still problems in the integration. One problem is that undersampling and oversampling are complementary each other and the sampling ratio is crucial to the classification performance. This thesis introduces a new method HSBagging which is based on bagging with hybrid sampling. Experiments show that HSBagging outperforms other state-of-the-art bagging method on imbalanced data. Another problem is about the integration of boosting and sampling for imbalanced data classification. The classifier weights of existing AdaBoost-based methods are inconsistent with the objective of class imbalance classification. In this thesis, we propose a novel boosting optimization framework GOBoost. This framework can be applied to any boosting-based method for class imbalance classification by simply replacing the calculation of classifier weights. Experiments show that the GOBoost-based methods significantly outperform the corresponding boosting-based methods. The second aspect is about online learning for imbalanced data stream with concept drift. In the online learning scenario, if the data stream is imbalanced, it will be difficult to detect concept drifts and adapt the online learner to them. The ensemble classifier weights are hard to adjust to achieve the balance between the stability and adaptability. Besides, the classier built on samples in size-fixed chunk, which may be highly imbalanced, is unstable in the ensemble. In this thesis, we propose Adaptive Chunk-based Dynamic Weighted Majority (ACDWM) to dynamically weigh the individual classifiers according to their performance on the current data chunk. Meanwhile, the chunk size is adaptively selected by statistical hypothesis tests. Experiments on both synthetic and real datasets with concept drift show that ACDWM outperforms both of the state-of-the-art chunk-based and online methods. In addition to imbalanced data classification, the third aspect is about clustering on imbalanced data. This thesis studies the key problem of imbalanced data clustering called uniform effect within the k-means-type framework, where the clustering results tend to be balanced. Thus, this thesis introduces a new method called Self-adaptive Multi-prototype-based Competitive Learning (SMCL) for imbalanced clusters. It uses multiple subclusters to represent each cluster with an automatic adjustment of the number of subclusters. Then, the subclusters are merged into the final clusters based on a novel separation measure. Experimental results show the efficacy of SMCL for imbalanced clusters and the superiorities against its competitors. Rather than a specific algorithm for imbalanced data learning, the final aspect is about a measure of class imbalanced dataset for classification. Recent studies have shown that imbalance ratio is not the only cause of the performance loss of a classifier in imbalanced data classification. To the best of our knowledge, there is no any measurement about the extent of influence of class imbalance on the classification performance of imbalanced data. Accordingly, this thesis proposes a data measure called Bayes Imbalance Impact Index (B1³) to reflect the extent of influence purely by the factor of imbalance for the whole dataset. As a result we can therefore use B1³ to judge whether it is worth using imbalance recovery methods like sampling or cost-sensitive methods to recover the performance loss of a classifier. The experiments show that B1³ is highly consistent with improvement of F1score made by the imbalance recovery methods on both synthetic and real benchmark datasets. Two ensemble frameworks for imbalanced data classification are proposed for sampling rate selection and boosting weight optimization, respectively. 2. •A chunk-based online learning algorithm is proposed to dynamically adjust the ensemble classifiers and select the chunk size for imbalanced data stream with concept drift. 3. •A multi-prototype competitive learning algorithm is proposed for clustering on imbalanced data. 4. •A measure of imbalanced data is proposed to evaluate how the classification performance of a dataset is influenced by the factor of imbalance.

APA, Harvard, Vancouver, ISO, and other styles

13

Abu-Hakmeh, Khaldoon Emad. "Assessing the use of voting methods to improve Bayesian network structure learning." Thesis, Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45826.

Full text

Abstract:

Structure inference in learning Bayesian networks remains an active interest in machine learning due to the breadth of its applications across numerous disciplines. As newer algorithms emerge to better handle the task of inferring network structures from observational data, network and experiment sizes heavily impact the performance of these algorithms. Specifically difficult is the task of accurately learning networks of large size under a limited number of observations, as often encountered in biological experiments. This study evaluates the performance of several leading structure learning algorithms on large networks. The selected algorithms then serve as a committee, which then votes on the final network structure. The result is a more selective final network, containing few false positives, with compromised ability to detect all network features.

APA, Harvard, Vancouver, ISO, and other styles

14

Skapura, Nicholas. "Analysis of Classifier Weaknesses Based on Patterns and Corrective Methods." Wright State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=wright1620719735415272.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Doran, Gary Brian Jr. "Multiple-Instance Learning from Distributions." Case Western Reserve University School of Graduate Studies / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=case1417736923.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Huntsinger, Richard A. "Evaluating Forecasting Performance in the Context of Process-Level Decisions: Methods, Computation Platform, and Studies in Residential Electricity Demand Estimation." Research Showcase @ CMU, 2017. http://repository.cmu.edu/dissertations/898.

Full text

Abstract:

This dissertation explores how decisions about the forecasting process can affect the evaluation of forecasting performance, in general and in the domain of residential electricity demand estimation. Decisions of interest include those around data sourcing, sampling, clustering, temporal magnification, algorithm selection, testing approach, evaluation metrics, and others. Models of the forecasting process and analysis methods are formulated in terms of a three-tier decision taxonomy, by which decision effects are exposed through systematic enumeration of the techniques resulting from those decisions. A computation platform based on the models is implemented to compute and visualize the effects. The methods and computation platform are first demonstrated by applying them to 3,003 benchmark datasets to investigate various decisions, including those that could impact the relationship between data entropy and forecastability. Then, they are used to study over 10,624 week-ahead and day-ahead residential electricity demand forecasting techniques, utilizing fine-resolution electricity usage data collected over 18 months on groups of 782 and 223 households by real smart electric grids in Ireland and Australia, respectively. The main finding from this research is that forecasting performance is highly sensitive to the interaction effects of many decisions. Sampling is found to be an especially effective data strategy, clustering not so, temporal magnification mixed. Other relationships between certain decisions and performance are surfaced, too. While these findings are empirical and specific to one practically scoped investigation, they are potentially generalizable, with implications for residential electricity demand estimation, smart electric grid design, and electricity policy.

APA, Harvard, Vancouver, ISO, and other styles

17

Melandri, Luca. "Introduction to Reservoir Computing Methods." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/8268/.

Full text

Abstract:

Il documento tratta la famiglia di metodologie di allenamento e sfruttamento delle reti neurali ricorrenti nota sotto il nome di Reservoir Computing. Viene affrontata un'introduzione sul Machine Learning in generale per fornire tutti gli strumenti necessari a comprendere l'argomento. Successivamente, vengono dati dettagli implementativi ed analisi dei vantaggi e punti deboli dei vari approcci, il tutto con supporto di codice ed immagini esplicative. Nel finale vengono tratte conclusioni sugli approcci, su quanto migliorabile e sulle applicazioni pratiche.

APA, Harvard, Vancouver, ISO, and other styles

18

Toghiani-Rizi, Babak. "Evaluation of Deep Learning Methods for Creating Synthetic Actors." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-324756.

Full text

Abstract:

Recent advancements in hardware, techniques and data availability have resulted in major advancements within the field of Machine Learning and specifically in a subset of modeling techniques referred to as Deep Learning. Virtual simulations are common tools of support in training and decision making within the military. These simulations can be populated with synthetic actors, often controlled through manually implemented behaviors, developed in a streamlined process by domain doctrines and programmers. This process is often time inefficient, expensive and error prone, potentially resulting in actors unrealistically superior or inferior to human players. This thesis evaluates alternative methods of developing the behavior of synthetic actors through state-of-the-art Deep Learning methods. Through a few selected Deep Reinforcement Learning algorithms, the actors are trained in four different light weight simulations with objectives like those that could be encountered in a military simulation. The results show that the actors trained with Deep Learning techniques can learn how to perform simple as well as more complex tasks by learning a behavior that could be difficult to manually program. The results also show the same algorithm can be used to train several totally different types of behavior, thus demonstrating the robustness of these methods. This thesis finally concludes that Deep Learning techniques have, given the right tools, a good potential as alternative methods of training the behavior of synthetic actors, and to potentially replace the current methods in the future.

APA, Harvard, Vancouver, ISO, and other styles

19

Kilinc, Ismail Ozsel. "Graph-based Latent Embedding, Annotation and Representation Learning in Neural Networks for Semi-supervised and Unsupervised Settings." Scholar Commons, 2017. https://scholarcommons.usf.edu/etd/7415.

Full text

Abstract:

Machine learning has been immensely successful in supervised learning with outstanding examples in major industrial applications such as voice and image recognition. Following these developments, the most recent research has now begun to focus primarily on algorithms which can exploit very large sets of unlabeled examples to reduce the amount of manually labeled data required for existing models to perform well. In this dissertation, we propose graph-based latent embedding/annotation/representation learning techniques in neural networks tailored for semi-supervised and unsupervised learning problems. Specifically, we propose a novel regularization technique called Graph-based Activity Regularization (GAR) and a novel output layer modification called Auto-clustering Output Layer (ACOL) which can be used separately or collaboratively to develop scalable and efficient learning frameworks for semi-supervised and unsupervised settings. First, singularly using the GAR technique, we develop a framework providing an effective and scalable graph-based solution for semi-supervised settings in which there exists a large number of observations but a small subset with ground-truth labels. The proposed approach is natural for the classification framework on neural networks as it requires no additional task calculating the reconstruction error (as in autoencoder based methods) or implementing zero-sum game mechanism (as in adversarial training based methods). We demonstrate that GAR effectively and accurately propagates the available labels to unlabeled examples. Our results show comparable performance with state-of-the-art generative approaches for this setting using an easier-to-train framework. Second, we explore a different type of semi-supervised setting where a coarse level of labeling is available for all the observations but the model has to learn a fine, deeper level of latent annotations for each one. Problems in this setting are likely to be encountered in many domains such as text categorization, protein function prediction, image classification as well as in exploratory scientific studies such as medical and genomics research. We consider this setting as simultaneously performed supervised classification (per the available coarse labels) and unsupervised clustering (within each one of the coarse labels) and propose a novel framework combining GAR with ACOL, which enables the network to perform concurrent classification and clustering. We demonstrate how the coarse label supervision impacts performance and the classification task actually helps propagate useful clustering information between sub-classes. Comparative tests on the most popular image datasets rigorously demonstrate the effectiveness and competitiveness of the proposed approach. The third and final setup builds on the prior framework to unlock fully unsupervised learning where we propose to substitute real, yet unavailable, parent- class information with pseudo class labels. In this novel unsupervised clustering approach the network can exploit hidden information indirectly introduced through a pseudo classification objective. We train an ACOL network through this pseudo supervision together with unsupervised objective based on GAR and ultimately obtain a k-means friendly latent representation. Furthermore, we demonstrate how the chosen transformation type impacts performance and helps propagate the latent information that is useful in revealing unknown clusters. Our results show state-of-the-art performance for unsupervised clustering tasks on MNIST, SVHN and USPS datasets with the highest accuracies reported to date in the literature.

APA, Harvard, Vancouver, ISO, and other styles

20

Horečný, Peter. "Metody segmentace obrazu s malými trénovacími množinami." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-412996.

Full text

Abstract:

The goal of this thesis was to propose an image segmentation method, which is capable of effective segmentation process with small datasets. Recently published ODE neural network was used for this method, because its features should provide better generalization in case of tasks with only small datasets available. The proposed ODE-UNet network was created by combining UNet architecture with ODE neural network, while using benefits of both networks. ODE-UNet reached following results on ISBI dataset: Rand: 0,950272 and Info: 0,978061. These results are better than the ones received from UNet model, which was also tested in this thesis, but it has been proven that state of the art can not be outperformed using ODE neural networks. However, the advantages of ODE neural network over tested UNet architecture and other methods were confirmed, and there is still a room for improvement by extending this method.

APA, Harvard, Vancouver, ISO, and other styles

21

Leite, Daniel Saraiva. "Um estudo comparativo de modelos baseados em estatísticas textuais, grafos e aprendizado de máquina para sumarização automática de textos em português." Universidade Federal de São Carlos, 2010. https://repositorio.ufscar.br/handle/ufscar/459.

Full text

Abstract:

Made available in DSpace on 2016-06-02T19:05:48Z (GMT). No. of bitstreams: 1 3512.pdf: 1897835 bytes, checksum: 598f309a846cb201fe8f13be0f2e37da (MD5) Previous issue date: 2010-12-21
Automatic text summarization has been of great interest in Natural Language Processing due to the need of processing a huge amount of information in short time, which is usually delivered through distinct media. Thus, large-scale methods are of utmost importance for synthesizing and making access to information simpler. They aim at preserving relevant content of the sources with little or no human intervention. Building upon the extractive summarizer SuPor and focusing on texts in Portuguese, this MsC work aimed at exploring varied features for automatic summarization. Computational methods especially driven towards textual statistics, graphs and machine learning have been explored. A meaningful extension of the SuPor system has resulted from applying such methods and new summarization models have thus been delineated. These are based either on each of the three methodologies in isolation, or are hybrid. In this dissertation, they are generically named after the original SuPor as SuPor-2. All of them have been assessed by comparing them with each other or with other, well-known, automatic summarizers for texts in Portuguese. The intrinsic evaluation tasks have been carried out entirely automatically, aiming at the informativeness of the outputs, i.e., the automatic extracts. They have also been compared with other well-known automatic summarizers for Portuguese. SuPor-2 results show a meaningful improvement of some SuPor-2 variations. The most promising models may thus be made available in the future, for generic use. They may also be embedded as tools for varied Natural Language Processing purposes. They may even be useful for other related tasks, such as linguistic studies. Portability to other languages is possible by replacing the resources that are language-dependent, namely, lexicons, part-of-speech taggers and stop words lists. Models that are supervised have been so far trained on news corpora. In spite of that, training for other genres may be carried out by interested users using the very same interfaces supplied by the systems.
A tarefa de Sumarização Automática de textos tem sido de grande importância dentro da área de Processamento de Linguagem Natural devido à necessidade de se processar gigantescos volumes de informação disponibilizados nos diversos meios de comunicação. Assim, mecanismos em larga escala para sintetizar e facilitar o acesso a essas informações são de extrema importância. Esses mecanismos visam à preservação do conteúdo mais relevante e com pouca ou nenhuma intervenção humana. Partindo do sumarizador extrativo SuPor e contemplando o Português, este trabalho de mestrado visou explorar variadas características de sumarização pela utilização de métodos computacionais baseados em estatísticas textuais, grafos e aprendizado de máquina. Esta exploração consistiu de uma extensão significativa do SuPor, pela definição de novos modelos baseados nessas três abordagens de forma individual ou híbrida. Por serem originários desse sistema, manteve-se a relação com seu nome, o que resultou na denominação genérica SuPor-2. Os diversos modelos propostos foram, então, comparados entre si em diversos experimentos, avaliando-se intrínseca e automaticamente a informatividade dos extratos produzidos. Foram realizadas também comparações com outros sistemas conhecidos para o Português. Os resultados obtidos evidenciam uma melhora expressiva de algumas variações do SuPor-2 em relação aos demais sumarizadores extrativos existentes para o Português. Os sistemas que se evidenciaram superiores podem ser disponibilizados no futuro para utilização geral por usuários comuns ou ainda para utilização como ferramentas em outras tarefas do Processamento de Língua Natural ou em áreas relacionadas. A portabilidade para outras línguas é possível com a substituição dos recursos dependentes de língua, como léxico, etiquetadores morfossintáticos e stoplist Os modelos supervisionados foram treinados com textos jornalísticos até o momento. O treino para outros gêneros pode ser feito pelos usuários interessados através dos próprios sistemas desenvolvidos

APA, Harvard, Vancouver, ISO, and other styles

22

Volkovs, Maksims. "Machine Learning Methods and Models for Ranking." Thesis, 2013. http://hdl.handle.net/1807/36042.

Full text

Abstract:

Ranking problems are ubiquitous and occur in a variety of domains that include social choice, information retrieval, computational biology and many others. Recent advancements in information technology have opened new data processing possibilities and signi cantly increased the complexity of computationally feasible methods. Through these advancements ranking models are now beginning to be applied to many new and diverse problems. Across these problems data, which ranges from gene expressions to images and web-documents, has vastly di erent properties and is often not human generated. This makes it challenging to apply many of the existing models for ranking which primarily originate in social choice and are typically designed for human generated preference data. As the field continues to evolve a new trend has recently emerged where machine learning methods are being used to automatically learn the ranking models. While these methods typically lack the theoretical support of the social choice models they often show excellent empirical performance and are able to handle large and diverse data placing virtually no restrictions on the data type. These model have now been successfully applied to many diverse ranking problems including image retrieval, protein selection, machine translation and many others. Inspired by these promising results the work presented in this thesis aims to advance machine methods for ranking and develop new techniques to allow e ective modeling of existing and future problems. The presented work concentrates on three di erent but related domains: information retrieval, preference aggregation and collaborative ltering. In each domain we develop new models together with learning and inference methods and empirically verify our models on real-life data.

APA, Harvard, Vancouver, ISO, and other styles

23

Li, Fengpei. "Stochastic Methods in Optimization and Machine Learning." Thesis, 2021. https://doi.org/10.7916/d8-ngq8-9s10.

Full text

Abstract:

Stochastic methods are indispensable to the modeling, analysis and design of complex systems involving randomness. In this thesis, we show how simulation techniques and simulation-based computational methods can be applied to a wide spectrum of applied domains including engineering, optimization and machine learning. Moreover, we show how analytical tools in statistics and computer science including empirical processes, probably approximately correct learning, and hypothesis testing can be used in these contexts to provide new theoretical results. In particular, we apply these techniques and present how our results can create new methodologies or improve upon existing state-of-the-art in three areas: decision making under uncertainty (chance-constrained programming, stochastic programming), machine learning (covariate shift, reinforcement learning) and estimation problems arising from optimization (gradient estimate of composite functions) or stochastic systems (solution of stochastic PDE). The work in the above three areas will be organized into six chapters, where each area contains two chapters. In Chapter 2, we study how to obtain feasible solutions for chance-constrained programming using data-driven, sampling-based scenario optimization (SO) approach. When the data size is insufficient to statistically support a desired level of feasibility guarantee, we explore how to leverage parametric information, distributionally robust optimization and Monte Carlo simulation to obtain a feasible solution of chance-constrained programming in small-sample situations. In Chapter 3, We investigate the feasibility of sample average approximation (SAA) for general stochastic optimization problems, including two-stage stochastic programming without the relatively complete recourse. We utilize results from the Vapnik-Chervonenkis (VC) dimension and Probably Approximately Correct learning to provide a general framework. In Chapter 4, we design a robust importance re-weighting method for estimation/learning problem in the covariate shift setting that improves the best-know rate. In Chapter 5, we develop a model-free reinforcement learning approach to solve constrained Markov decision processes (MDP). We propose a two-stage procedure that generates policies with simultaneous guarantees on near-optimality and feasibility. In Chapter 6, we use multilevel Monte Carlo to construct unbiased estimators for expectations of random parabolic PDE. We obtain estimators with finite variance and finite expected computational cost, but bypassing the curse of dimensionality. In Chapter 7, we introduce unbiased gradient simulation algorithms for solving stochastic composition optimization (SCO) problems. We show that the unbiased gradients generated by our algorithms have finite variance and finite expected computational cost.

APA, Harvard, Vancouver, ISO, and other styles

24

"Machine Learning Methods for Diagnosis, Prognosis and Prediction of Long-term Treatment Outcome of Major Depression." Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.44430.

Full text

Abstract:

abstract: Major Depression, clinically called Major Depressive Disorder, is a mood disorder that affects about one eighth of population in US and is projected to be the second leading cause of disability in the world by the year 2020. Recent advances in biotechnology have enabled us to collect a great variety of data which could potentially offer us a deeper understanding of the disorder as well as advancing personalized medicine. This dissertation focuses on developing methods for three different aspects of predictive analytics related to the disorder: automatic diagnosis, prognosis, and prediction of long-term treatment outcome. The data used for each task have their specific characteristics and demonstrate unique problems. Automatic diagnosis of melancholic depression is made on the basis of metabolic profiles and micro-array gene expression profiles where the presence of missing values and strong empirical correlation between the variables is not unusual. To deal with these problems, a method of generating a representative set of features is proposed. Prognosis is made on data collected from rating scales and questionnaires which consist mainly of categorical and ordinal variables and thus favor decision tree based predictive models. Decision tree models are known for the notorious problem of overfitting. A decision tree pruning method that overcomes the shortcomings of a greedy nature and reliance on heuristics inherent in traditional decision tree pruning approaches is proposed. The method is further extended to prune Gradient Boosting Decision Tree and tested on the task of prognosis of treatment outcome. Follow-up studies evaluating the long-term effect of the treatments on patients usually measure patients' depressive symptom severity monthly, resulting in the actual time of relapse upper bounded by the observed time of relapse. To resolve such uncertainty in response, a general loss function where the hypothesis could take different forms is proposed to predict the risk of relapse in situations where only an interval for time of relapse can be derived from the observed data.
Dissertation/Thesis
Doctoral Dissertation Computer Science 2017

APA, Harvard, Vancouver, ISO, and other styles

25

Gurioli, Gianmarco. "Adaptive Regularisation Methods under Inexact Evaluations for Nonconvex Optimisation and Machine Learning Applications." Doctoral thesis, 2021. http://hdl.handle.net/2158/1238314.

Full text

Abstract:

The major aim of this research thesis is to handle two main challenges arising when solving unconstrained optimisation problems with second-order methods: the reduction of the per-iteration cost and the stochastic analysis of the resulting non- deterministic algorithms. This is motivated by the fact that second-order procedures can be more efficient than first-order ones on badly scaled and ill-conditioned problems, since they seem to potentially take advantage of curvature information to easier escape from saddle points, being more robust to the choice of hyperparameters and the parameters tuning, but at the price of a more expensive per-iteration cost, due to the computation of Hessian-vector products. Furthermore, the effort of reducing such a cost with inexact function and/or derivatives evaluations, that have to fulfill suitable accuracy requirements, leads to non-deterministic variants of the methods, that have to be supported by a stochastic complexity analysis. The thesis builds on a particular class of second-order globally convergent methods based on the Adaptive Cubic Regularisation (ARC) framework, motivated by the fact that its complexity, in terms of the worst-case number of iterations to reach a first-order critical point, has been proved to be optimal. To this purpose, the design, analysis and development of novel variants of ARC methods, employing inexact derivatives and/or function. evaluations, are investigated. To start with, a suitable reference version of the ARC method is firstly introduced, obtained by merging existing basic forms of ARC algorithms, in order to set the general background on adaptive cubic regularisation. Having set the scene, we then cope with the need of introducing inexactness in function and derivatives computations while conserving optimal complexity. After setting the finite-sum minimisation framework, this starts with the employment of inexact Hessian information, adaptively chosen, before moving on to an extended framework based on function estimates and approximate derivatives evaluations. The stochastic complexity analysis of the presented frameworks is thus performed. Finally, numerical tests within the context of supervised learning are reported, ranging from popular machine learning datasets to a real-life machine learning industrial application related to the parametric design of centrifugal pumps.

APA, Harvard, Vancouver, ISO, and other styles

26

Capobianco, Samuele. "Deep Learning Methods for Document Image Understanding." Doctoral thesis, 2020. http://hdl.handle.net/2158/1182536.

Full text

Abstract:

Document image understanding involves several tasks including, among others, the layout analysis of historical handwritten and the symbol recognition in graphical documents. The understanding of document images implies two processes, the analysis, and the recognition, which are complex tasks. Moreover, each application domain has a specific information structure which increases the complexity of the understanding process. In the last years, many machine learning approaches have been presented to address document image understanding. In this research, we present a series of deep learning methods to address different application domains: historical handwritten and graphical documents understanding. We show the difficulties encountered when applying these techniques and the proposed solutions for each application domain. We cope with the problem of working with supervised deep networks that require to have a large dataset for training. We address the over-fitting related to the scarcity of labeled data showing several solutions to prevent this issue in these application domains. First, we show our contributions to historical handwritten layout analysis. We propose a toolkit to generate structured synthetic documents emulating the actual document production process. Synthetic documents can be used to train systems to perform layout analysis. Then, we study the use of deep networks for counting the number of records in each page of a historical handwritten document. Furthermore, we present a novel approach for the extraction of text lines in handwritten documents using another deep network to label document image patches as text lines or separators. Related to the page segmentation, we propose a fully convolutional network trained by a domain-specific loss for classifying pixels to segment semantic regions on handwritten pages. Second, we propose a novel interactive annotation system to help users to label symbols at the pixel level for the graphical symbol understanding problem. Using the proposed interactive system we can improve the annotation results and reduce the time-consuming process of labeling data. Using this system, we built a novel floor plan image dataset for object detection. We show preliminary results by using state-of-the-art deep networks to detect symbols on this dataset. In the end, we provide an extensive discussion for each task addressed showing the obtained results and proposing future works.

APA, Harvard, Vancouver, ISO, and other styles

27

"Optimizing Performance Measures in Classification Using Ensemble Learning Methods." Master's thesis, 2017. http://hdl.handle.net/2286/R.I.44123.

Full text

Abstract:

abstract: Ensemble learning methods like bagging, boosting, adaptive boosting, stacking have traditionally shown promising results in improving the predictive accuracy in classification. These techniques have recently been widely used in various domains and applications owing to the improvements in computational efficiency and distributed computing advances. However, with the advent of wide variety of applications of machine learning techniques to class imbalance problems, further focus is needed to evaluate, improve and optimize other performance measures such as sensitivity (true positive rate) and specificity (true negative rate) in classification. This thesis demonstrates a novel approach to evaluate and optimize the performance measures (specifically sensitivity and specificity) using ensemble learning methods for classification that can be especially useful in class imbalanced datasets. In this thesis, ensemble learning methods (specifically bagging and boosting) are used to optimize the performance measures (sensitivity and specificity) on a UC Irvine (UCI) 130 hospital diabetes dataset to predict if a patient will be readmitted to the hospital based on various feature vectors. From the experiments conducted, it can be empirically concluded that, by using ensemble learning methods, although accuracy does improve to some margin, both sensitivity and specificity are optimized significantly and consistently over different cross validation approaches. The implementation and evaluation has been done on a subset of the large UCI 130 hospital diabetes dataset. The performance measures of ensemble learners are compared to the base machine learning classification algorithms such as Naive Bayes, Logistic Regression, k Nearest Neighbor, Decision Trees and Support Vector Machines.
Dissertation/Thesis
Masters Thesis Computer Science 2017

APA, Harvard, Vancouver, ISO, and other styles

28

Huynh, Tuyen Ngoc. "Improving the accuracy and scalability of discriminative learning methods for Markov logic networks." Thesis, 2011. http://hdl.handle.net/2152/ETD-UT-2011-05-3436.

Full text

Abstract:

Many real-world problems involve data that both have complex structures and uncertainty. Statistical relational learning (SRL) is an emerging area of research that addresses the problem of learning from these noisy structured/relational data. Markov logic networks (MLNs), sets of weighted first-order logic formulae, are a simple but powerful SRL formalism that generalizes both first-order logic and Markov networks. MLNs have been successfully applied to a variety of real-world problems ranging from extraction knowledge from text to visual event recognition. Most of the existing learning algorithms for MLNs are in the generative setting: they try to learn a model that is equally capable of predicting the values of all variables given an arbitrary set of evidence; and they do not scale to problems with thousands of examples. However, many real-world problems in structured/relational data are discriminative--where the variables are divided into two disjoint sets input and output, and the goal is to correctly predict the values of the output variables given evidence data about the input variables. In addition, these problems usually involve data that have thousands of examples. Thus, it is important to develop new discriminative learning methods for MLNs that are more accurate and more scalable, which are the topics addressed in this thesis. First, we present a new method that discriminatively learns both the structure and parameters for a special class of MLNs where all the clauses are non-recursive ones. Non-recursive clauses arise in many learning problems in Inductive Logic Programming. To further improve the predictive accuracy, we propose a max-margin approach to learning weights for MLNs. Then, to address the issue of scalability, we present CDA, an online max-margin weight learning algorithm for MLNs. Ater [sic] that, we present OSL, the first algorithm that performs both online structure learning and parameter learning. Finally, we address an issue arising in applying MLNs to many real-world problems: learning in the presence of many hard constraints. Including hard constraints during training greatly increases the computational complexity of the learning problem. Thus, we propose a simple heuristic for selecting which hard constraints to include during training. Experimental results on several real-world problems show that the proposed methods are more accurate, more scalable (can handle problems with thousands of examples), or both more accurate and more scalable than existing learning methods for MLNs.
text

APA, Harvard, Vancouver, ISO, and other styles

29

Kahya, Emre Onur. "Identifying electrons with deep learning methods." Thesis, 2020. http://hdl.handle.net/1866/25101.

Full text

Abstract:

Cette thèse porte sur les techniques de l’apprentissage machine et leur application à un problème important de la physique des particules expérimentale: l’identification des électrons de signal résultant des collisions proton-proton au Grand collisionneur de hadrons. Au chapitre 1, nous fournissons des informations sur le Grand collisionneur de hadrons et expliquons pourquoi il a été construit. Nous présentons ensuite plus de détails sur ATLAS, l’un des plus importants détecteurs du Grand collisionneur de hadrons. Ensuite, nous expliquons en quoi consiste la tâche d’identification des électrons ainsi que l’importance de bien la mener à terme. Enfin, nous présentons des informations détaillées sur l’ensemble de données que nous utilisons pour résoudre cette tâche d’identification des électrons. Au chapitre 2, nous donnons une brève introduction des principes fondamentaux de l’apprentissage machine. Après avoir défini et introduit les différents types de tâche d’apprentissage, nous discutons des diverses façons de représenter les données d’entrée. Ensuite, nous présentons ce qu’il faut apprendre de ces données et comment y parvenir. Enfin, nous examinons les problèmes qui pourraient se présenter en régime de “sur-apprentissage”. Au chapitres 3, nous motivons le choix de l’architecture choisie pour résoudre notre tâche, en particulier pour les sections où des images séquentielles sont utilisées comme entrées. Nous présentons ensuite les résultats de nos expériences et montrons que notre modèle fonctionne beaucoup mieux que les algorithmes présentement utilisés par la collaboration ATLAS. Enfin, nous discutons des futures orientations afin d’améliorer davantage nos résultats. Au chapitre 4, nous abordons les deux concepts que sont la généralisation hors distribution et la planéité de la surface associée à la fonction de coût. Nous prétendons que les algorithmes qui font converger la fonction coût vers minimum couvrant une région large et plate sont également ceux qui offrent le plus grand potentiel de généralisation pour les tâches hors distribution. Nous présentons les résultats de l’application de ces deux algorithmes à notre ensemble de données et montrons que cela soutient cette affirmation. Nous terminons avec nos conclusions.
This thesis is about applying the tools of Machine Learning to an important problem of experimental particle physics: identifying signal electrons after proton-proton collisions at the Large Hadron Collider. In Chapters 1, we provide some information about the Large Hadron Collider and explain why it was built. We give further details about one of the biggest detectors in the Large Hadron Collider, the ATLAS. Then we define what electron identification task is, as well as the importance of solving it. Finally, we give detailed information about our dataset that we use to solve the electron identification task. In Chapters 2, we give a brief introduction to fundamental principles of machine learning. Starting with the definition and types of different learning tasks, we discuss various ways to represent inputs. Then we present what to learn from the inputs as well as how to do it. And finally, we look at the problems that would arise if we “overdo” learning. In Chapters 3, we motivate the choice of the architecture to solve our task, especially for the parts that have sequential images as inputs. We then present the results of our experiments and show that our model performs much better than the existing algorithms that the ATLAS collaboration currently uses. Finally, we discuss future directions to further improve our results. In Chapter 4, we discuss two concepts: out of distribution generalization and flatness of loss surface. We claim that the algorithms, that brings a model into a wide flat minimum of its training loss surface, would generalize better for out of distribution tasks. We give the results of implementing two such algorithms to our dataset and show that it supports our claim. Finally, we end with our conclusions.

APA, Harvard, Vancouver, ISO, and other styles

30

Tran, Dustin. "Probabilistic Programming for Deep Learning." Thesis, 2020. https://doi.org/10.7916/d8-95c9-sj96.

Full text

Abstract:

We propose the idea of deep probabilistic programming, a synthesis of advances for systems at the intersection of probabilistic modeling and deep learning. Such systems enable the development of new probabilistic models and inference algorithms that would otherwise be impossible: enabling unprecedented scales to billions of parameters, distributed and mixed precision environments, and AI accelerators; integration with neural architectures for modeling massive and high-dimensional datasets; and the use of computation graphs for automatic differentiation and arbitrary manipulation of probabilistic programs for flexible inference and model criticism. After describing deep probabilistic programming, we discuss applications in novel variational inference algorithms and deep probabilistic models. First, we introduce the variational Gaussian process (VGP), a Bayesian nonparametric variational family, which adapts its shape to match complex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity of the true posterior. Second, we introduce hierarchical implicit models (HIMs). HIMs combine the idea of implicit densities with hierarchical Bayesian modeling, thereby defining models via simulators of data with rich hidden structure.

APA, Harvard, Vancouver, ISO, and other styles

31

(8768079), Nanxin Jin. "ASD PREDICTION FROM STRUCTURAL MRI WITH MACHINE LEARNING." Thesis, 2020.

Find full text

Abstract:

Autism Spectrum Disorder (ASD) is part of the developmental disabilities. There are numerous symptoms for ASD patients, including lack of abilities in social interaction, communication obstacle and repeatable behaviors. Meanwhile, the rate of ASD prevalence has kept rising by the past 20 years from 1 out of 150 in 2000 to 1 out of 54 in 2016. In addition, the ASD population is quite large. Specifically, 3.5 million Americans live with ASD in the year of 2014, which will cost U.S. citizens $236-$262 billion dollars annually for autism services. So, it is critical to make an accurate diagnosis for preschool age children with ASD, in order to give them a better life. Instead of using traditional ASD behavioral tests, such as ADI-R, ADOS, and DSM-IV, we applied brain MRI images as input to make diagnosis. We revised 3D-ResNet structure to fit 110 preschool children's brain MRI data, along with Convolution 3D and VGG model. The prediction accuracy with raw data is 65.22%. The accuracy is significantly improved to 82.61% by removing the noise around the brain. We also showed the speed of ML prediction is 308 times faster than behavior tests.

APA, Harvard, Vancouver, ISO, and other styles

32

(5929916), Sudhir B. Kylasa. "HIGHER ORDER OPTIMIZATION TECHNIQUES FOR MACHINE LEARNING." Thesis, 2019.

Find full text

Abstract:

First-order methods such as Stochastic Gradient Descent are methods of choice for solving non-convex optimization problems in machine learning. These methods primarily rely on the gradient of the loss function to estimate descent direction. However, they have a number of drawbacks, including converging to saddle points (as opposed to minima), slow convergence, and sensitivity to parameter tuning. In contrast, second order methods that use curvature information in addition to the gradient, have been shown to achieve faster convergence rates, theoretically. When used in the context of machine learning applications, they offer faster (quadratic) convergence, stability to parameter tuning, and robustness to problem conditioning. In spite of these advantages, first order methods are commonly used because of their simplicity of implementation and low per-iteration cost. The need to generate and use curvature information in the form of a dense Hessian matrix makes each iteration of second order methods more expensive.

In this work, we address three key problems associated with second order methods – (i) what is the best way to incorporate curvature information into the optimization procedure; (ii) how do we reduce the operation count of each iteration in a second order method, while maintaining its superior convergence property; and (iii) how do we leverage high-performance computing platforms to significant accelerate second order methods. To answer the first question, we propose and validate the use of Fisher information matrices in second order methods to significantly accelerate convergence. The second question is answered through the use of statistical sampling techniques that suitably sample matrices to reduce per-iteration cost without impacting convergence. The third question is addressed through the use of graphics processing units (GPUs) in distributed platforms to deliver state of the art solvers.

Through our work, we show that our solvers are capable of significant improvement over state of the art optimization techniques for training machine learning models. We demonstrate improvements in terms of training time (over an order of magnitude in wall-clock time), generalization properties of learned models, and robustness to problem conditioning.

APA, Harvard, Vancouver, ISO, and other styles

33

"Computational Methods for Knowledge Integration in the Analysis of Large-scale Biological Networks." Doctoral diss., 2012. http://hdl.handle.net/2286/R.I.15204.

Full text

Abstract:

abstract: As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems, uncertainty due to environmental influences, and the heterogeneity between individual patients render this a difficult task. In the last decade, several algorithms have been proposed to elucidate cellular systems from data, resulting in numerous data-driven hypotheses. However, due to the large number of variables involved in the process, many of which are unknown or not measurable, such computational approaches often lead to a high proportion of false positives. This renders interpretation of the data-driven hypotheses extremely difficult. Consequently, a dismal proportion of these hypotheses are subject to further experimental validation, eventually limiting their potential to augment existing biological knowledge. This dissertation develops a framework of computational methods for the analysis of such data-driven hypotheses leveraging existing biological knowledge. Specifically, I show how biological knowledge can be mapped onto these hypotheses and subsequently augmented through novel hypotheses. Biological hypotheses are learnt in three levels of abstraction -- individual interactions, functional modules and relationships between pathways, corresponding to three complementary aspects of biological systems. The computational methods developed in this dissertation are applied to high throughput cancer data, resulting in novel hypotheses with potentially significant biological impact.
Dissertation/Thesis
Ph.D. Computer Science 2012

APA, Harvard, Vancouver, ISO, and other styles

34

George, Thomas. "Factorized second order methods in neural networks." Thèse, 2017. http://hdl.handle.net/1866/20190.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Krueger, David. "Designing Regularizers and Architectures for Recurrent Neural Networks." Thèse, 2016. http://hdl.handle.net/1866/14019.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Пилипенко, Анна Василівна. "Застосування статистичних методів та методів штучного інтелекту для оптимізації процесу електронного навчання." Магістерська робота, 2020. https://dspace.znu.edu.ua/jspui/handle/12345/5000.

Full text

Abstract:

Пилипенко А. В. Застосування статистичних методів та методів штучного інтелекту для оптимізації процесу електронного навчання : кваліфікаційна робота магістра спеціальності 121 «Інженерія програмного забезпечення» / наук. керівник Н. П. Полякова. Запоріжжя : ЗНУ, 2020. 109 с.
UA : Мета роботи полягає у дослідженні та вивченні методів статистики та штучного інтелекту у сфері електронного навчання, порівняння їх особливостей, перевірка можливостей застосування і аналіз модуля рекомендаційної системи як інструменту для використання в навчальних платформах. Досліджено методи і конкуруючі сучасні системи прогнозування та створення рекомендацій курсів, фільмів, відео, новин, товарів і можливості розробки і використання системи у розрізі платформи для електронної освіти. Порівняно методи штучного інтелекту для рекомендації навчальних матеріалів. Спроектовано та реалізовано два застосунки на мові програмування C#, кожен з яких виконує функцію прогнозування (рекомендації).
EN : The purpose of the work is to research and study the methods of statistics and artificial intelligence in the field of e-learning, compare their features, test the applicability and analyze the recommendation system module as a tool for use in educational platforms. Methods and competing modern systems for predictions and creating recommendations for courses, films, videos, news, goods and the possibilities of developing and using the system in the context of a platform for e-education are investigated. Comparison of artificial intelligence methods for recommendation of e-learning materials. Designed and implemented two applications in the C # programming language, each of which performs the function of prediction (recommendation).

APA, Harvard, Vancouver, ISO, and other styles

37

Evgeniou, Theodoros, and Massimiliano Pontil. "A Note on the Generalization Performance of Kernel Classifiers with Margin." 2000. http://hdl.handle.net/1721.1/7169.

Full text

Abstract:

We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.

APA, Harvard, Vancouver, ISO, and other styles

38

Serdyuk, Dmitriy. "Advances in deep learning methods for speech recognition and understanding." Thesis, 2020. http://hdl.handle.net/1866/24803.

Full text

Abstract:

Ce travail expose plusieurs études dans les domaines de la reconnaissance de la parole et compréhension du langage parlé. La compréhension sémantique du langage parlé est un sous-domaine important de l'intelligence artificielle. Le traitement de la parole intéresse depuis longtemps les chercheurs, puisque la parole est une des charactéristiques qui definit l'être humain. Avec le développement du réseau neuronal artificiel, le domaine a connu une évolution rapide à la fois en terme de précision et de perception humaine. Une autre étape importante a été franchie avec le développement d'approches bout en bout. De telles approches permettent une coadaptation de toutes les parties du modèle, ce qui augmente ainsi les performances, et ce qui simplifie la procédure d'entrainement. Les modèles de bout en bout sont devenus réalisables avec la quantité croissante de données disponibles, de ressources informatiques et, surtout, avec de nombreux développements architecturaux innovateurs. Néanmoins, les approches traditionnelles (qui ne sont pas bout en bout) sont toujours pertinentes pour le traitement de la parole en raison des données difficiles dans les environnements bruyants, de la parole avec un accent et de la grande variété de dialectes. Dans le premier travail, nous explorons la reconnaissance de la parole hybride dans des environnements bruyants. Nous proposons de traiter la reconnaissance de la parole, qui fonctionne dans un nouvel environnement composé de différents bruits inconnus, comme une tâche d'adaptation de domaine. Pour cela, nous utilisons la nouvelle technique à l'époque de l'adaptation du domaine antagoniste. En résumé, ces travaux antérieurs proposaient de former des caractéristiques de manière à ce qu'elles soient distinctives pour la tâche principale, mais non-distinctive pour la tâche secondaire. Cette tâche secondaire est conçue pour être la tâche de reconnaissance de domaine. Ainsi, les fonctionnalités entraînées sont invariantes vis-à-vis du domaine considéré. Dans notre travail, nous adoptons cette technique et la modifions pour la tâche de reconnaissance de la parole dans un environnement bruyant. Dans le second travail, nous développons une méthode générale pour la régularisation des réseaux génératif récurrents. Il est connu que les réseaux récurrents ont souvent des difficultés à rester sur le même chemin, lors de la production de sorties longues. Bien qu'il soit possible d'utiliser des réseaux bidirectionnels pour une meilleure traitement de séquences pour l'apprentissage des charactéristiques, qui n'est pas applicable au cas génératif. Nous avons développé un moyen d'améliorer la cohérence de la production de longues séquences avec des réseaux récurrents. Nous proposons un moyen de construire un modèle similaire à un réseau bidirectionnel. L'idée centrale est d'utiliser une perte L2 entre les réseaux récurrents génératifs vers l'avant et vers l'arrière. Nous fournissons une évaluation expérimentale sur une multitude de tâches et d'ensembles de données, y compris la reconnaissance vocale, le sous-titrage d'images et la modélisation du langage. Dans le troisième article, nous étudions la possibilité de développer un identificateur d'intention de bout en bout pour la compréhension du langage parlé. La compréhension sémantique du langage parlé est une étape importante vers le développement d'une intelligence artificielle de type humain. Nous avons vu que les approches de bout en bout montrent des performances élevées sur les tâches, y compris la traduction automatique et la reconnaissance de la parole. Nous nous inspirons des travaux antérieurs pour développer un système de bout en bout pour la reconnaissance de l'intention.
This work presents several studies in the areas of speech recognition and understanding. The semantic speech understanding is an important sub-domain of the broader field of artificial intelligence. Speech processing has had interest from the researchers for long time because language is one of the defining characteristics of a human being. With the development of neural networks, the domain has seen rapid progress both in terms of accuracy and human perception. Another important milestone was achieved with the development of end-to-end approaches. Such approaches allow co-adaptation of all the parts of the model thus increasing the performance, as well as simplifying the training procedure. End-to-end models became feasible with the increasing amount of available data, computational resources, and most importantly with many novel architectural developments. Nevertheless, traditional, non end-to-end, approaches are still relevant for speech processing due to challenging data in noisy environments, accented speech, and high variety of dialects. In the first work, we explore the hybrid speech recognition in noisy environments. We propose to treat the recognition in the unseen noise condition as the domain adaptation task. For this, we use the novel at the time technique of the adversarial domain adaptation. In the nutshell, this prior work proposed to train features in such a way that they are discriminative for the primary task, but non-discriminative for the secondary task. This secondary task is constructed to be the domain recognition task. Thus, the features trained are invariant towards the domain at hand. In our work, we adopt this technique and modify it for the task of noisy speech recognition. In the second work, we develop a general method for regularizing the generative recurrent networks. It is known that the recurrent networks frequently have difficulties staying on same track when generating long outputs. While it is possible to use bi-directional networks for better sequence aggregation for feature learning, it is not applicable for the generative case. We developed a way improve the consistency of generating long sequences with recurrent networks. We propose a way to construct a model similar to bi-directional network. The key insight is to use a soft L2 loss between the forward and the backward generative recurrent networks. We provide experimental evaluation on a multitude of tasks and datasets, including speech recognition, image captioning, and language modeling. In the third paper, we investigate the possibility of developing an end-to-end intent recognizer for spoken language understanding. The semantic spoken language understanding is an important step towards developing a human-like artificial intelligence. We have seen that the end-to-end approaches show high performance on the tasks including machine translation and speech recognition. We draw the inspiration from the prior works to develop an end-to-end system for intent recognition.

APA, Harvard, Vancouver, ISO, and other styles

39

(6630578), Yellamraju Tarun. "n-TARP: A Random Projection based Method for Supervised and Unsupervised Machine Learning in High-dimensions with Application to Educational Data Analysis." Thesis, 2019.

Find full text

Abstract:

Analyzing the structure of a dataset is a challenging problem in high-dimensions as the volume of the space increases at an exponential rate and typically, data becomes sparse in this high-dimensional space. This poses a significant challenge to machine learning methods which rely on exploiting structures underlying data to make meaningful inferences. This dissertation proposes the n-TARP method as a building block for high-dimensional data analysis, in both supervised and unsupervised scenarios.

The basic element, n-TARP, consists of a random projection framework to transform high-dimensional data to one-dimensional data in a manner that yields point separations in the projected space. The point separation can be tuned to reflect classes in supervised scenarios and clusters in unsupervised scenarios. The n-TARP method finds linear separations in high-dimensional data. This basic unit can be used repeatedly to find a variety of structures. It can be arranged in a hierarchical structure like a tree, which increases the model complexity, flexibility and discriminating power. Feature space extensions combined with n-TARP can also be used to investigate non-linear separations in high-dimensional data.

The application of n-TARP to both supervised and unsupervised problems is investigated in this dissertation. In the supervised scenario, a sequence of n-TARP based classifiers with increasing complexity is considered. The point separations are measured by classification metrics like accuracy, Gini impurity or entropy. The performance of these classifiers on image classification tasks is studied. This study provides an interesting insight into the working of classification methods. The sequence of n-TARP classifiers yields benchmark curves that put in context the accuracy and complexity of other classification methods for a given dataset. The benchmark curves are parameterized by classification error and computational cost to define a benchmarking plane. This framework splits this plane into regions of "positive-gain" and "negative-gain" which provide context for the performance and effectiveness of other classification methods. The asymptotes of benchmark curves are shown to be optimal (i.e. at Bayes Error) in some cases (Theorem 2.5.2).

In the unsupervised scenario, the n-TARP method highlights the existence of many different clustering structures in a dataset. However, not all structures present are statistically meaningful. This issue is amplified when the dataset is small, as random events may yield sample sets that exhibit separations that are not present in the distribution of the data. Thus, statistical validation is an important step in data analysis, especially in high-dimensions. However, in order to statistically validate results, often an exponentially increasing number of data samples are required as the dimensions increase. The proposed n-TARP method circumvents this challenge by evaluating statistical significance in the one-dimensional space of data projections. The n-TARP framework also results in several different statistically valid instances of point separation into clusters, as opposed to a unique "best" separation, which leads to a distribution of clusters induced by the random projection process.

The distributions of clusters resulting from n-TARP are studied. This dissertation focuses on small sample high-dimensional problems. A large number of distinct clusters are found, which are statistically validated. The distribution of clusters is studied as the dimensionality of the problem evolves through the extension of the feature space using monomial terms of increasing degree in the original features, which corresponds to investigating non-linear point separations in the projection space.

A statistical framework is introduced to detect patterns of dependence between the clusters formed with the features (predictors) and a chosen outcome (response) in the data that is not used by the clustering method. This framework is designed to detect the existence of a relationship between the predictors and response. This framework can also serve as an alternative cluster validation tool.

The concepts and methods developed in this dissertation are applied to a real world data analysis problem in Engineering Education. Specifically, engineering students' Habits of Mind are analyzed. The data at hand is qualitative, in the form of text, equations and figures. To use the n-TARP based analysis method, the source data must be transformed into quantitative data (vectors). This is done by modeling it as a random process based on the theoretical framework defined by a rubric. Since the number of students is small, this problem falls into the small sample high-dimensions scenario. The n-TARP clustering method is used to find groups within this data in a statistically valid manner. The resulting clusters are analyzed in the context of education to determine what is represented by the identified clusters. The dependence of student performance indicators like the course grade on the clusters formed with n-TARP are studied in the pattern dependence framework, and the observed effect is statistically validated. The data obtained suggests the presence of a large variety of different patterns of Habits of Mind among students, many of which are associated with significant grade differences. In particular, the course grade is found to be dependent on at least two Habits of Mind: "computation and estimation" and "values and attitudes."

APA, Harvard, Vancouver, ISO, and other styles

40

(9187466), Bharath Kumar Comandur Jagannathan Raghunathan. "Semantic Labeling of Large Geographic Areas Using Multi-Date and Multi-View Satellite Images and Noisy OpenStreetMap Labels." Thesis, 2020.

Find full text

Abstract:

This dissertation addresses the problem of how to design a convolutional neural network (CNN) for giving semantic labels to the points on the ground given the satellite image coverage over the area and, for the ground truth, given the noisy labels in OpenStreetMap (OSM). This problem is made challenging by the fact that -- (1) Most of the images are likely to have been recorded from off-nadir viewpoints for the area of interest on the ground; (2) The user-supplied labels in OSM are frequently inaccurate and, not uncommonly, entirely missing; and (3) The size of the area covered on the ground must be large enough to possess any engineering utility. As this dissertation demonstrates, solving this problem requires that we first construct a DSM (Digital Surface Model) from a stereo fusion of the available images, and subsequently use the DSM to map the individual pixels in the satellite images to points on the ground. That creates an association between the pixels in the images and the noisy labels in OSM. The CNN-based solution we present yields a 4-8% improvement in the per-class segmentation IoU (Intersection over Union) scores compared to the traditional approaches that use the views independently of one another. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-`a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. This work also presents, for arguably the first time, an in-depth discussion of large-area image alignment and DSM construction using tens of true multi-date and multi-view WorldView-3 satellite images on a distributed OpenStack cloud computing platform.

APA, Harvard, Vancouver, ISO, and other styles

41

Engster, David. "Local- and Cluster Weighted Modeling for Prediction and State Estimation of Nonlinear Dynamical Systems." Doctoral thesis, 2010. http://hdl.handle.net/11858/00-1735-0000-0006-B4FD-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

(8771429), Ashley S. Dale. "3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAINING." Thesis, 2021.

Find full text

Abstract:

An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ _F1= 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Machine Learning, Artificial Intelligence, Regularization Methods'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles