Dissertations / Theses: 'Explainability of machine learning models'

1

Delaunay, Julien. "Explainability for machine learning models : from data adaptability to user perception." Electronic Thesis or Diss., Université de Rennes (2023-....), 2023. http://www.theses.fr/2023URENS076.

Full text

Abstract:

Cette thèse se concentre sur la génération d'explications locales pour les modèles de machine learning déjà déployés, en recherchant les conditions optimales pour des explications pertinentes, prenant en compte à la fois les données et les besoins de l'utilisateur. L'objectif principal est de développer des méthodes produisant des explications pour n'importe quel modèle de prédiction, tout en veillant à ce que ces explications demeurent à la fois fidèles au modèle sous-jacent et compréhensibles par les utilisateurs qui les reçoivent. La thèse est divisée en deux parties. Dans la première, on améliore une méthode d'explication basée sur des règles. On introduit ensuite une approche pour évaluer l'adéquation des explications linéaires pour approximer un modèle à expliquer. Enfin, cette partie présente une expérimentation comparative entre deux familles de méthodes d'explication contrefactuelles, dans le but d'analyser les avantages de l'une par rapport à l'autre. La deuxième partie se concentre sur des expériences utilisateurs évaluant l'impact de trois méthodes d'explication et de deux représentations différentes. Ces expériences mesurent la perception en termes de compréhension et de confiance des utilisateurs en fonction des explications et de leurs représentations. L'ensemble de ces travaux contribue à une meilleure compréhension de la génération d'explications pour les modèles de machine learning, avec des implications potentielles pour l'amélioration de la transparence, de la confiance et de l'utilisabilité des systèmes d'IA déployés
This thesis explores the generation of local explanations for already deployed machine learning models, aiming to identify optimal conditions for producing meaningful explanations considering both data and user requirements. The primary goal is to develop methods for generating explanations for any model while ensuring that these explanations remain faithful to the underlying model and comprehensible to the users. The thesis is divided into two parts. The first enhances a widely used rule-based explanation method to improve the quality of explanations. It then introduces a novel approach for evaluating the suitability of linear explanations to approximate a model. Additionally, it conducts a comparative experiment between two families of counterfactual explanation methods to analyze the advantages of one over the other. The second part focuses on user experiments to assess the impact of three explanation methods and two distinct representations. These experiments measure how users perceive their interaction with the model in terms of understanding and trust, depending on the explanations and representations. This research contributes to a better explanation generation, with potential implications for enhancing the transparency, trustworthiness, and usability of deployed AI systems

APA, Harvard, Vancouver, ISO, and other styles

2

Stanzione, Vincenzo Maria. "Developing a new approach for machine learning explainability combining local and global model-agnostic approaches." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25480/.

Full text

Abstract:

The last couple of past decades have seen a new flourishing season for the Artificial Intelligence, in particular for Machine Learning (ML). This is reflected in the great number of fields that are employing ML solutions to overcome a broad spectrum of problems. However, most of the last employed ML models have a black-box behavior. This means that given a certain input, we are not able to understand why one of these models produced a certain output or made a certain decision. Most of the time, we are not interested in knowing what and how the model is thinking, but if we think of a model which makes extremely critical decisions or takes decisions that have a heavy result on people’s lives, in these cases explainability is a duty. A great variety of techniques to perform global or local explanations are available. One of the most widespread is Local Interpretable Model-Agnostic Explanations (LIME), which creates a local linear model in the proximity of an input to understand in which way each feature contributes to the final output. However, LIME is not immune from instability problems and sometimes to incoherent predictions. Furthermore, as a local explainability technique, LIME needs to be performed for each different input that we want to explain. In this work, we have been inspired by the LIME approach for linear models to craft a novel technique. In combination with the Model-based Recursive Partitioning (MOB), a brand-new score function to assess the quality of a partition and the usage of Sobol quasi-Montecarlo sampling, we developed a new global model-agnostic explainability technique we called Global-Lime. Global-Lime is capable of giving a global understanding of the original ML model, through an ensemble of spatially not overlapped hyperplanes, plus a local explanation for a certain output considering only the corresponding linear approximation. The idea is to train the black-box model and then supply along with it its explainable version.

APA, Harvard, Vancouver, ISO, and other styles

3

Ayad, Célia. "Towards Reliable Post Hoc Explanations for Machine Learning on Tabular Data and their Applications." Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAX082.

Full text

Abstract:

Alors que l’apprentissage automatique continue de démontrer de solides capacités prédictives, il est devenu un outil très précieux dans plusieurs domaines scientifiques et industriels. Cependant, à mesure que les modèles ML évoluent pour atteindre une plus grande précision, ils deviennent également de plus en plus complexes et nécessitent davantage de paramètres.Être capable de comprendre les complexités internes et d’établir une confiance dans les prédictions de ces modèles d’apprentissage automatique est donc devenu essentiel dans divers domaines critiques, notamment la santé et la finance.Les chercheurs ont développé des méthodes d’explication pour rendre les modèles d’apprentissage automatique plus transparents, aidant ainsi les utilisateurs à comprendre pourquoi les prédictions sont faites. Cependant, ces méthodes d’explication ne parviennent souvent pas à expliquer avec précision les prédictions des modèles, ce qui rend difficile leur utilisation efficace par les experts du domaine. Il est crucial d'identifier les lacunes des explications du ML, d'améliorer leur fiabilité et de les rendre plus conviviales. De plus, alors que de nombreuses tâches de ML sont de plus en plus gourmandes en données et que la demande d'intégration généralisée augmente, il existe un besoin pour des méthodes offrant de solides performances prédictives de manière plus simple et plus rentable.Dans cette thèse, nous abordons ces problèmes dans deux axes de recherche principaux:1) Nous proposons une méthodologie pour évaluer diverses méthodes d'explicabilité dans le contexte de propriétés de données spécifiques, telles que les niveaux de bruit, les corrélations de caractéristiques et le déséquilibre de classes, et proposons des conseils aux praticiens et aux chercheurs pour sélectionner la méthode d'explicabilité la plus appropriée en fonction des caractéristiques de leurs ensembles de données, révélant où ces méthodes excellent ou échouent.De plus, nous fournissons aux cliniciens des explications personnalisées sur les facteurs de risque du cancer du col de l’utérus en fonction de leurs propriétés souhaitées telles que la facilité de compréhension, la cohérence et la stabilité.2) Nous introduisons Shapley Chains, une nouvelle technique d'explication conçue pour surmonter le manque d'explications conçues pour les cas à sorties multiples où les étiquettes sont interdépendantes, où les caractéristiques peuvent avoir des contributions indirectes pour prédire les étiquettes ultérieures dans la chaîne (l'ordre dans lequel ces étiquettes sont prédit). De plus, nous proposons Bayes LIME Chains pour améliorer la robustesse de Shapley Chains
As machine learning continues to demonstrate robust predictive capabili-ties, it has emerged as a very valuable tool in several scientific and indus-trial domains. However, as ML models evolve to achieve higher accuracy,they also become increasingly complex and require more parameters. Beingable to understand the inner complexities and to establish trust in the pre-dictions of these machine learning models, has therefore become essentialin various critical domains including healthcare, and finance. Researchershave developed explanation methods to make machine learning models moretransparent, helping users understand why predictions are made. However,these explanation methods often fall short in accurately explaining modelpredictions, making it difficult for domain experts to utilize them effectively.It’s crucial to identify the shortcomings of ML explanations, enhance theirreliability, and make them more user-friendly. Additionally, with many MLtasks becoming more data-intensive and the demand for widespread inte-gration rising, there is a need for methods that deliver strong predictiveperformance in a simpler and more cost-effective manner. In this disserta-tion, we address these problems in two main research thrusts: 1) We proposea methodology to evaluate various explainability methods in the context ofspecific data properties, such as noise levels, feature correlations, and classimbalance, and offer guidance for practitioners and researchers on selectingthe most suitable explainability method based on the characteristics of theirdatasets, revealing where these methods excel or fail. Additionally, we pro-vide clinicians with personalized explanations of cervical cancer risk factorsbased on their desired properties such as ease of understanding, consistency,and stability. 2) We introduce Shapley Chains, a new explanation techniquedesigned to overcome the lack of explanations of multi-output predictionsin the case of interdependent labels, where features may have indirect con-tributions to predict subsequent labels in the chain (i.e. the order in whichthese labels are predicted). Moreover, we propose Bayes LIME Chains toenhance the robustness of Shapley Chains

APA, Harvard, Vancouver, ISO, and other styles

4

Radulovic, Nedeljko. "Post-hoc Explainable AI for Black Box Models on Tabular Data." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT028.

Full text

Abstract:

Les modèles d'intelligence artificielle (IA) actuels ont fait leurs preuves dans la résolution de diverses tâches, telles que la classification, la régression, le traitement du langage naturel (NLP) et le traitement d'images. Les ressources dont nous disposons aujourd'hui nous permettent d'entraîner des modèles d'IA très complexes pour résoudre différents problèmes dans presque tous les domaines : médecine, finance, justice, transport, prévisions, etc. Avec la popularité et l'utilisation généralisée des modèles d'IA, la nécessite d'assurer la confiance dans ces modèles s'est également accrue. Aussi complexes soient-ils aujourd'hui, ces modèles d'IA sont impossibles à interpréter et à comprendre par les humains. Dans cette thèse nous nous concentrons sur un domaine de recherche spécifique, à savoir l'intelligence artificielle explicable (xAI), qui vise à fournir des approches permettant d'interpréter les modèles d'IA complexes et d'expliquer leurs décisions. Nous présentons deux approches, STACI et BELLA, qui se concentrent sur les tâches de classification et de régression, respectivement, pour les données tabulaires. Les deux méthodes sont des approches post-hoc agnostiques au modèle déterministe, ce qui signifie qu'elles peuvent être appliquées à n'importe quel modèle boîte noire après sa création. De cette manière, l'interopérabilité présente une valeur ajoutée sans qu'il soit nécessaire de faire des compromis sur les performances du modèle de boîte noire. Nos méthodes fournissent des interprétations précises, simples et générales à la fois de l'ensemble du modèle boîte noire et de ses prédictions individuelles. Nous avons confirmé leur haute performance par des expériences approfondies et étude d'utilisateurs
Current state-of-the-art Artificial Intelligence (AI) models have been proven to be verysuccessful in solving various tasks, such as classification, regression, Natural Language Processing(NLP), and image processing. The resources that we have at our hands today allow us to trainvery complex AI models to solve different problems in almost any field: medicine, finance, justice,transportation, forecast, etc. With the popularity and widespread use of the AI models, the need toensure the trust in them also grew. Complex as they come today, these AI models are impossible to be interpreted and understood by humans. In this thesis, we focus on the specific area of research, namely Explainable Artificial Intelligence (xAI), that aims to provide the approaches to interpret the complex AI models and explain their decisions. We present two approaches STACI and BELLA which focus on classification and regression tasks, respectively, for tabular data. Both methods are deterministic model-agnostic post-hoc approaches, which means that they can be applied to any black-box model after its creation. In this way, interpretability presents an added value without the need to compromise on black-box model's performance. Our methods provide accurate, simple and general interpretations of both the whole black-box model and its individual predictions. We confirmed their high performance through extensive experiments and a user study

APA, Harvard, Vancouver, ISO, and other styles

5

Willot, Hénoïk. "Certified explanations of robust models." Electronic Thesis or Diss., Compiègne, 2024. http://www.theses.fr/2024COMP2812.

Full text

Abstract:

Avec l'utilisation croissante des systèmes d'aide à la décision, automatisés ou semi-automatisés, en intelligence artificielle se crée le besoin de les rendre fiables et transparents pour un utilisateur final. Tandis que le rôle des méthodes d'explicabilité est généralement d'augmenter la transparence, la fiabilité peut être obtenue en fournissant des explications certifiées, dans le sens qu'elles sont garanties d'être vraies, et en considérant des modèles robustes qui peuvent s'abstenir quand l'information disponible est trop insuffisante, plutôt que de forcer une décision dans l'unique but d'éviter l'indécision. Ce dernier aspect est communément appelé "inférence sceptique". Ce travail s'inscrit dans ces considérations, en étudiant deux cas : - Le premier se focalise sur un modèle classique de décision utilisé pour intégrer de l'équité, les Sommes Pondérées Ordonnées (Ordered Weighted Averaging -- OWA) à poids décroissants. Notre principale contribution est de caractériser d'un point de vue axiomatique un ensemble convexe de ces règles, et de proposer à partir de cette caractérisation un schéma explicatif correct et complet des décisions prises qui peuvent être obtenues efficacement à partir d'heuristiques. Ce faisant, nous proposons aussi un cadre unifiant les dominances de Lorenz restreintes et généralisées, deux critères qualitatifs, et les OWA décroissants précis. - Le second se focalise sur le cas où la règle de décision est un modèle de classification obtenu à partir d'une procédure d'apprentissage sous forme d'un ensemble convexe de probabilités. Nous étudions et traitons le problème de fournir des impliquants premiers comme explication dans ce contexte, où en plus d'expliquer les préférences d'une classe sur une autre, nous avons aussi à traiter le cas où deux classes sont considérées incomparables. Nous décrivons ces problèmes de manière générale avant de les étudier en détail pour la version robuste du classifieur de Bayes Naïf
With the advent of automated or semi-automated decision systems in artificial intelligence comes the need of making them more reliable and transparent for an end-user. While the role of explainable methods is in general to increase transparency, reliability can be achieved by providing certified explanations, in the sense that those are guaranteed to be true, and by considering robust models that can abstain when having insufficient information, rather than enforcing precision for the mere sake of avoiding indecision. This last aspect is commonly referred to as skeptical inference. This work participates to this effort, by considering two cases: - The first one considers classical decision rules used to enforce fairness, which are the Ordered Weighted Averaging (OWA) with decreasing weights. Our main contribution is to fully characterise from an axiomatic perspective convex sets of such rules, and to provide together with this sound and complete explanation schemes that can be efficiently obtained through heuristics. Doing so, we also provide a unifying framework between the restricted and generalized Lorenz dominance, two qualitative criteria, and precise decreasing OWA. - The second one considers that our decision rule is a classification model resulting from a learning procedure, where the resulting model is a set of probabilities. We study and discuss the problem of providing prime implicant as explanations in such a case, where in addition to explaining clear preferences of one class over the other, we also have to treat the problem of declaring two classes as being incomparable. We describe the corresponding problems in general ways, before studying in more details the robust counter-part of the Naive Bayes Classifier

APA, Harvard, Vancouver, ISO, and other styles

6

Kurasinski, Lukas. "Machine Learning explainability in text classification for Fake News detection." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20058.

Full text

Abstract:

Fake news detection gained an interest in recent years. This made researchers try to findmodels that can classify text in the direction of fake news detection. While new modelsare developed, researchers mostly focus on the accuracy of a model. There is little researchdone in the subject of explainability of Neural Network (NN) models constructed for textclassification and fake news detection. When trying to add a level of explainability to aNeural Network model, allot of different aspects have to be taken under consideration.Text length, pre-processing, and complexity play an important role in achieving successfully classification. Model’s architecture has to be taken under consideration as well. Allthese aspects are analyzed in this thesis. In this work, an analysis of attention weightsis performed to give an insight into NN reasoning about texts. Visualizations are usedto show how 2 models, Bidirectional Long-Short term memory Convolution Neural Network (BIDir-LSTM-CNN), and Bidirectional Encoder Representations from Transformers(BERT), distribute their attentions while training and classifying texts. In addition, statistical data is gathered to deepen the analysis. After the analysis, it is concluded thatexplainability can positively influence the decisions made while constructing a NN modelfor text classification and fake news detection. Although explainability is useful, it is nota definitive answer to the problem. Architects should test, and experiment with differentsolutions, to be successful in effective model construction.

APA, Harvard, Vancouver, ISO, and other styles

7

Lounici, Sofiane. "Watermarking machine learning models." Electronic Thesis or Diss., Sorbonne université, 2022. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2022SORUS282.pdf.

Full text

Abstract:

La protection de la propriété intellectuelle des modèles d’apprentissage automatique apparaît de plus en plus nécessaire, au vu des investissements et de leur impact sur la société. Dans cette thèse, nous proposons d’étudier le tatouage de modèles d’apprentissage automatique. Nous fournissons un état de l’art sur les techniques de tatouage actuelles, puis nous le complétons en considérant le tatouage de modèles au-delà des tâches de classification d’images. Nous définissons ensuite les attaques de contrefaçon contre le tatouage pour les plateformes d’hébergement de modèles, et nous présentons une nouvelle technique de tatouages par biais algorithmique. De plus, nous proposons une implémentation des techniques présentées
The protection of the intellectual property of machine learning models appears to be increasingly necessary, given the investments and their impact on society. In this thesis, we propose to study the watermarking of machine learning models. We provide a state of the art on current watermarking techniques, and then complement it by considering watermarking beyond image classification tasks. We then define forging attacks against watermarking for model hosting platforms and present a new fairness-based watermarking technique. In addition, we propose an implementation of the presented techniques

APA, Harvard, Vancouver, ISO, and other styles

8

Maltbie, Nicholas. "Integrating Explainability in Deep Learning Application Development: A Categorization and Case Study." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1623169431719474.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Hardoon, David Roi. "Semantic models for machine learning." Thesis, University of Southampton, 2006. https://eprints.soton.ac.uk/262019/.

Full text

Abstract:

In this thesis we present approaches to the creation and usage of semantic models by the analysis of the data spread in the feature space. We aim to introduce the general notion of using feature selection techniques in machine learning applications. The applied approaches obtain new feature directions on data, such that machine learning applications would show an increase in performance. We review three principle methods that are used throughout the thesis. Firstly Canonical Correlation Analysis (CCA), which is a method of correlating linear relationships between two multidimensional variables. CCA can be seen as using complex labels as a way of guiding feature selection towards the underlying semantics. CCA makes use of two views of the same semantic object to extract a representation of the semantics. Secondly Partial Least Squares (PLS), a method similar to CCA. It selects feature directions that are useful for the task at hand, though PLS only uses one view of an object and the label as the corresponding pair. PLS could be thought of as a method that looks for directions that are good for distinguishing the different labels. The third method is the Fisher kernel. A method that aims to extract more information of a generative model than simply by their output probabilities. The aim is to analyse how the Fisher score depends on the model and which aspects of the model are important in determining the Fisher score. We focus our theoretical investigation primarily on CCA and its kernel variant. Providing a theoretical analysis of the method's stability using Rademacher complexity, hence deriving the error bound for new data. We conclude the thesis by applying the described approaches to problems in the various fields of image, text, music application and medical analysis, describing several novel applications on relevant real-world data. The aim of the thesis is to provide a theoretical understanding of semantic models, while also providing a good application foundation on how these models can be practically used.

APA, Harvard, Vancouver, ISO, and other styles

10

BODINI, MATTEO. "DESIGN AND EXPLAINABILITY OF MACHINE LEARNING ALGORITHMS FOR THE CLASSIFICATION OF CARDIAC ABNORMALITIES FROM ELECTROCARDIOGRAM SIGNALS." Doctoral thesis, Università degli Studi di Milano, 2022. http://hdl.handle.net/2434/888002.

Full text

Abstract:

The research activity contained in the present thesis work is devoted to the development of novel Machine Learning (ML) and Deep Learning (DL) algorithms for the classification of Cardiac Abnormalities (CA) from Electrocardiogram (ECG) signals, along with the explanation of classification outputs with explainable approaches. Automated computer programs for ECG classification have been developed since 1950s to improve the correct interpretation of the ECG, nowadays facilitating health care decision-making by reducing costs and human errors. The first ECG interpretation computer programs were essentially developed by emph{translating into the machine} the domain knowledge provided by expert physicians. However, in the last years leading research groups proposed to employ standard ML algorithms (which involve feature extraction, followed by classification), and more recently emph{end-to-end} DL algorithms to build automated ECG classification computer programs for the detection of CA. Recently, several research works proposed DL algorithms which even exceeded the performance of board-certified cardiologists in detecting a wide range of CA from ECGs. As a matter of fact, DL algorithms seem to represent promising tools for automated ECG classification on the analyzed datasets. However, the latest research related to ML and DL carries two main drawbacks that were tackled throughout the doctoral experience. First, to let the standard ML algorithms to perform at their best, the proper preprocessing, feature engineering, and classification algorithm (along with its parameters and hyperparameters) must be selected. Even when end-to-end DL approaches are adopted, and the feature extraction step is automatically learned from data, the optimal model architecture is crucial to get the best performance. To address this issue, we exploited the domain knowledge of electrocardiography to design an ensemble ML classification algorithm to classify within a wide range of 27 CA. Differently from other works in the context of ECG classification, which often borrowed ML and DL architectures from other domains, we designed each model in the ensemble according to the domain knowledge to specifically classify a subset of the considered CA that alter the same set of ECG physiological features known by physicians. Furthermore, in a subsequent work, toward the same aim we experimented three different Automated ML frameworks to automatically find the optimal ML pipeline in the case of standard and end-to-end DL algorithms. Second, while several research articles reported remarkable results for the value of ML and DL in classifying ECGs, only a handful offer insights into the model’s learning representation of the ECG for the respective task. Without explaining what these models are sensing on the ECG to perform their classifications in an explainable way, the developers of such algorithms run a strong risk of discouraging the physicians to adopt these tools, since they need to understand how ML and DL work before entrusting it to facilitate their clinical practice. Methods to open the emph{black-boxes} of ML and DL have been applied to the ECG in a few works, but they often provided only explanations restricted to a single ECG at time and with limited, or even absent, framing into the knowledge domain of electrocardiography. To tackle such issues, we developed techniques to unveil which portions of the ECG were the most relevant to the classification output of a ML algorithm, by computing average explanations over all the training samples, and translating them for the physicians' understanding. In a preliminary work, we relied on the Local Interpretable Model-agnostic Explanations (LIME) explainability algorithm to highlight which ECG leads were the most relevant in the classification of ST-Elevation Myocardial Infarction with a Random Forest classifier. Then, in a subsequent work, we extended the approach and we designed two model-specific explainability algorithms for Convolutional Neural Networks to explain which ECG waves, a concept understood by physicians, were the most relevant in the classification process of a wide set of 27 CA for a state-of-the-art CNN.

APA, Harvard, Vancouver, ISO, and other styles

11

Bone, Nicholas. "Models of programs and machine learning." Thesis, University of Oxford, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.244565.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Zhu, Xiaodan. "On Cross-Series Machine Learning Models." W&M ScholarWorks, 2020. https://scholarworks.wm.edu/etd/1616444550.

Full text

Abstract:

Sparse high dimensional time series are common in industry, such as in supply chain demand and retail sales. Accurate and reliable forecasting of high dimensional time series is essential for supply chain planning and business management. In practical applications, sparse high dimensional time series prediction faces three challenges: (1) simple models cannot capture complex patterns, (2) insufficient data prevents us from pursuing more advanced models, and (3) time series in the same dataset may have widely different properties. These challenges prevent the currently prevalent models and theoretically successful advanced models (e.g., neural networks) from working in actual use. We focus our research on a pharmaceutical (pharma) demand forecasting problem. To overcome the challenges faced by sparse high dimensional time series, we develop a cross-series learning framework that trains a machine learning model on multiple related time series and uses cross-series information to improve forecasting accuracy. Cross-series learning is further optimized by dividing the global time series into subgroups based on three grouping schemes to balance the tradeoff between sample size and sample quality. Moreover, downstream inventory is introduced as an additional feature to support demand forecasting. Combining the cross-series learning framework with advanced machine learning models, we significantly improve the accuracy of pharma demand predictions. To verify the generalizability of cross-series learning, a generic forecasting framework containing the operations required for cross-series learning is developed and applied to retail sales forecasting. We further confirm the benefits of cross-series learning for advanced models, especially RNN. In addition to the grouping schemes based on product characteristics, we also explore two grouping schemes based on time series clustering, which do not require domain knowledge and can be applied to other fields. Using a retail sales dataset, our cross-series machine learning models are still superior to the baseline models. This dissertation develops a collection of cross-series learning techniques optimized for sparse high dimensional time series that can be applied to pharma manufacturers, retailers, and possibly other industries. Extensive experiments are carried out on real datasets to provide empirical value and insights for relevant theoretical studies. In practice, our work guides the actual use of cross-series learning.

APA, Harvard, Vancouver, ISO, and other styles

13

Amerineni, Rajesh. "BRAIN-INSPIRED MACHINE LEARNING CLASSIFICATION MODELS." OpenSIUC, 2020. https://opensiuc.lib.siu.edu/dissertations/1806.

Full text

Abstract:

This dissertation focuses on the development of three classes of brain-inspired machine learning classification models. The models attempt to emulate (a) multi-sensory integration, (b) context-integration, and (c) visual information processing in the brain.The multi-sensory integration models are aimed at enhancing object classification through the integration of semantically congruent unimodal stimuli. Two multimodal classification models are introduced: the feature integrating (FI) model and the decision integrating (DI) model. The FI model, inspired by multisensory integration in the subcortical superior colliculus, combines unimodal features which are subsequently classified by a multimodal classifier. The DI model, inspired by integration in primary cortical areas, classifies unimodal stimuli independently using unimodal classifiers and classifies the combined decisions using a multimodal classifier. The multimodal classifier models are be implemented using multilayer perceptrons and multivariate statistical classifiers. Experiments involving the classification of noisy and attenuated auditory and visual representations of ten digits are designed to demonstrate the properties of the multimodal classifiers and to compare the performances of multimodal and unimodal classifiers. The experimental results show that the multimodal classification systems exhibit an important aspect of the “inverse effectiveness principle” by yielding significantly higher classification accuracies when compared with those of the unimodal classifiers. Furthermore, the flexibility offered by the generalized models enables the simulations and evaluations of various combinations of multimodal stimuli and classifiers under varying uncertainty conditions. The context-integrating model emulates the brain’s ability to use contextual information to uniquely resolve the interpretation of ambiguous stimuli. A deep learning neural network classification model that emulates this ability by integrating weighted bidirectional context into the classification process is introduced. The model, referred to as the CINET, is implemented using a convolution neural network (CNN), which is shown to be ideal for combining target and context stimuli and for extracting coupled target-context features. The CINET parameters can be manipulated to simulate congruent and incongruent context environments and to manipulate target-context stimuli relationships. The formulation of the CINET is quite general; consequently, it is not restricted to stimuli in any particular sensory modality nor to the dimensionality of the stimuli. A broad range of experiments are designed to demonstrate the effectiveness of the CINET in resolving ambiguous visual stimuli and in improving the classification of non-ambiguous visual stimuli in various contextual environments. The fact that the performance improves through the inclusion of context can be exploited to design robust brain-inspired machine learning algorithms. It is interesting to note that the CINET is a classification model that is inspired by a combination of brain’s ability to integrate contextual information and the CNN, which is inspired by the hierarchical processing of visual information in the visual cortex. A convolution neural network (CNN) model, inspired by the hierarchical processing of visual information in the brain, is introduced to fuse information from an ensemble of multi-axial sensors in order to classify strikes such as boxing punches and taekwondo kicks in combat sports. Although CNNs are not an obvious choice for non-array data nor for signals with non-linear variations, it will be shown that CNN models can effectively classify multi-axial multi-sensor signals. Experiments involving the classification of three-axis accelerometer and three-axes gyroscope signals measuring boxing punches and taekwondo kicks showed that the performance of the fusion classifiers were significantly superior to the uni-axial classifiers. Interestingly, the classification accuracies of the CNN fusion classifiers were significantly higher than those of the DTW fusion classifiers. Through training with representative signals and the local feature extraction property, the CNNs tend to be invariant to the latency shifts and non-linear variations. Moreover, by increasing the number of network layers and the training set, the CNN classifiers offer the potential for even better performance as well as the ability to handle a larger number of classes. Finally, due to the generalized formulations, the classifier models can be easily adapted to classify multi-dimensional signals of multiple sensors in various other applications.

APA, Harvard, Vancouver, ISO, and other styles

14

MARRAS, MIRKO. "Machine Learning Models for Educational Platforms." Doctoral thesis, Università degli Studi di Cagliari, 2020. http://hdl.handle.net/11584/285377.

Full text

Abstract:

Scaling up education online and onlife is presenting numerous key challenges, such as hardly manageable classes, overwhelming content alternatives, and academic dishonesty while interacting remotely. However, thanks to the wider availability of learning-related data and increasingly higher performance computing, Artificial Intelligence has the potential to turn such challenges into an unparalleled opportunity. One of its sub-fields, namely Machine Learning, is enabling machines to receive data and learn for themselves, without being programmed with rules. Bringing this intelligent support to education at large scale has a number of advantages, such as avoiding manual error-prone tasks and reducing the chance that learners do any misconduct. Planning, collecting, developing, and predicting become essential steps to make it concrete into real-world education. This thesis deals with the design, implementation, and evaluation of Machine Learning models in the context of online educational platforms deployed at large scale. Constructing and assessing the performance of intelligent models is a crucial step towards increasing reliability and convenience of such an educational medium. The contributions result in large data sets and high-performing models that capitalize on Natural Language Processing, Human Behavior Mining, and Machine Perception. The model decisions aim to support stakeholders over the instructional pipeline, specifically on content categorization, content recommendation, learners’ identity verification, and learners’ sentiment analysis. Past research in this field often relied on statistical processes hardly applicable at large scale. Through our studies, we explore opportunities and challenges introduced by Machine Learning for the above goals, a relevant and timely topic in literature. Supported by extensive experiments, our work reveals a clear opportunity in combining human and machine sensing for researchers interested in online education. Our findings illustrate the feasibility of designing and assessing Machine Learning models for categorization, recommendation, authentication, and sentiment prediction in this research area. Our results provide guidelines on model motivation, data collection, model design, and analysis techniques concerning the above applicative scenarios. Researchers can use our findings to improve data collection on educational platforms, to reduce bias in data and models, to increase model effectiveness, and to increase the reliability of their models, among others. We expect that this thesis can support the adoption of Machine Learning models in educational platforms even more, strengthening the role of data as a precious asset. The thesis outputs are publicly available at https://www.mirkomarras.com.

APA, Harvard, Vancouver, ISO, and other styles

15

Kim, Been. "Interactive and interpretable machine learning models for human machine collaboration." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/98680.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 135-143).
I envision a system that enables successful collaborations between humans and machine learning models by harnessing the relative strength to accomplish what neither can do alone. Machine learning techniques and humans have skills that complement each other - machine learning techniques are good at computation on data at the lowest level of granularity, whereas people are better at abstracting knowledge from their experience, and transferring the knowledge across domains. The goal of this thesis is to develop a framework for human-in-the-loop machine learning that enables people to interact effectively with machine learning models to make better decisions, without requiring in-depth knowledge about machine learning techniques. Many of us interact with machine learning systems everyday. Systems that mine data for product recommendations, for example, are ubiquitous. However these systems compute their output without end-user involvement, and there are typically no life or death consequences in the case the machine learning result is not acceptable to the user. In contrast, domains where decisions can have serious consequences (e.g., emergency response panning, medical decision-making), require the incorporation of human experts' domain knowledge. These systems also must be transparent to earn experts' trust and be adopted in their workflow. The challenge addressed in this thesis is that traditional machine learning systems are not designed to extract domain experts' knowledge from natural workflow, or to provide pathways for the human domain expert to directly interact with the algorithm to interject their knowledge or to better understand the system output. For machine learning systems to make a real-world impact in these important domains, these systems must be able to communicate with highly skilled human experts to leverage their judgment and expertise, and share useful information or patterns from the data. In this thesis, I bridge this gap by building human-in-the-loop machine learning models and systems that compute and communicate machine learning results in ways that are compatible with the human decision-making process, and that can readily incorporate human experts' domain knowledge. I start by building a machine learning model that infers human teams' planning decisions from the structured form of natural language of team meetings. I show that the model can infer a human teams' final plan with 86% accuracy on average. I then design an interpretable machine learning model then "makes sense to humans" by exploring and communicating patterns and structure in data to support human decision-making. Through human subject experiments, I show that this interpretable machine learning model offers statistically significant quantitative improvements in interpretability while preserving clustering performance. Finally, I design a machine learning model that supports transparent interaction with humans without requiring that a user has expert knowledge of machine learning technique. I build a human-in-the-loop machine learning system that incorporates human feedback and communicates its internal states to humans, using an intuitive medium for interaction with the machine learning model. I demonstrate the application of this model for an educational domain in which teachers cluster programming assignments to streamline the grading process.
by Been Kim.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

16

Shen, Chenyang. "Regularized models and algorithms for machine learning." HKBU Institutional Repository, 2015. https://repository.hkbu.edu.hk/etd_oa/195.

Full text

Abstract:

Multi-lable learning (ML), multi-instance multi-label learning (MIML), large network learning and random under-sampling system are four active research topics in machine learning which have been studied intensively recently. So far, there are still a lot of open problems to be figured out in these topics which attract worldwide attention of researchers. This thesis mainly focuses on several novel methods designed for these research tasks respectively. Then main difference between ML learning and traditional classification task is that in ML learning, one object can be characterized by several different labels (or classes). One important observation is that the labels received by similar objects in ML data are usually highly correlated with each other. In order to exploring this correlation of labels between objects which might be a key issue in ML learning, we consider to require the resulting label indicator to be low rank. In the proposed model, nuclear norm which is a famous convex relaxation of intractable matrix rank is introduced to label indicator in order to exploiting the underlying correlation in label domain. Motivated by the idea of spectral clustering, we also incorporate information from feature domain by constructing a graph among objects based on their features. Then with partial label information available, we integrate them together into a convex low rank based model designed for ML learning. The proposed model can be solved efficiently by using alternating direction method of multiplier (ADMM). We test the performance on several benchmark ML data sets and make comparisons with the state-of-art algorithms. The classification results demonstrate the efficiency and effectiveness of the proposed low rank based methods. One step further, we consider MIML learning problem which is usually more complicated than ML learning: besides the possibility of having multiple labels, each object can be described by multiple instances simultaneously which may significantly increase the size of data. To handle the MIML learning problem we first propose and develop a novel sparsity-based MIML learning algorithm. Our idea is to formulate and construct a transductive objective function for label indicator to be learned by using the method of random walk with restart that exploits the relationships among instances and labels of objects, and computes the affinities among the objects. Then sparsity can be introduced in the labels indicator of the objective function such that relevant and irrelevant objects with respect to a given class can be distinguished. The resulting sparsity-based MIML model can be given as a constrained convex optimization problem, and it can be solved very efficiently by using the augmented Lagrangian method (ALM). Experimental results on benchmark data have shown that the proposed sparse-MIML algorithm is computationally efficient, and effective in label prediction for MIML data. We demonstrate that the performance of the proposed method is better than the other testing MIML learning algorithms. Moreover, one big concern of an MIML learning algorithm is computational efficiency, especially when figuring out classification problem for large data sets. Most of the existing methods for solving MIML problems in literature may take a long computational time and have a huge storage cost for large MIML data sets. In this thesis, our main aim is to propose and develop an efficient Markov Chain based learning algorithm for MIML problems. Our idea is to perform labels classification among objects and features identification iteratively through two Markov chains constructed by using objects and features respectively. The classification of objects can be obtained by using labels propagation via training data in the iterative method. Because it is not necessary to compute and store a huge affinity matrix among objects/instances, both the storage and computational time can be reduced significantly. For instance, when we handle MIML image data set of 10000 objects and 250000 instances, the proposed algorithm takes about 71 seconds. Also experimental results on some benchmark data sets are reported to illustrate the effectiveness of the proposed method in one-error, ranking loss, coverage and average precision, and show that it is competitive with the other methods. In addition, we consider the module identification from large biological networks. Nowadays, the interactions among different genes, proteins and other small molecules are becoming more and more significant and have been studied intensively. One general way that helps people understand these interactions is to analyze networks constructed from genes/proteins. In particular, module structure as a common property of most biological networks has drawn much attention of researchers from different fields. However, biological networks might be corrupted by noise in the data which often lead to the miss-identification of module structure. Besides, some edges in network might be removed (or some nodes might be miss-connected) when improper parameters are selected which may also affect the module identified significantly. In conclusion, the module identification results are sensitive to noise as well as parameter selection of network. In this thesis, we consider employing multiple networks for consistent module detection in order to reduce the effect of noise and parameter settings. Instead of studying different networks separately, our idea is to combine multiple networks together by building them into tensor structure data. Then given any node as prior label information, tensor-based Markov chains are constructed iteratively for identification of the modules shared by the multiple networks. In addition, the proposed tensor-based Markov chain algorithm is capable of simultaneously evaluating the contribution from each network. It would be useful to measure the consistency of modules in the multiple networks. In the experiments, we test our method on two groups of gene co-expression networks from human beings. We also validate biological meaning of modules identified by the proposed method. Finally, we introduce random under-sampling techniques with application to X-ray computed tomography (CT). Under-sampling techniques are realized to be powerful tools of reducing the scale of problem especially for large data analysis. However, information loss seems to be un-avoidable which inspires different under-sampling strategies for preserving more useful information. Here we focus on under-sampling for the real-world CT reconstruction problem. The main motivation is to reduce the total radiation dose delivered to patient which has arisen significant clinical concern for CT imaging. We compare two popular regular CT under-sampling strategies with ray random under-sampling. The results support the conclusion that random under-sampling always outperforms regular ones especially for the high down-sampling ratio cases. Moreover, based on the random ray under-sampling strategy, we propose a novel scatter removal method which further improves performance of ray random under-sampling in CT reconstruction.

APA, Harvard, Vancouver, ISO, and other styles

17

Ahlin, Mikael, and Felix Ranby. "Predicting Marketing Churn Using Machine Learning Models." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-161408.

Full text

Abstract:

For any organisation that engages in marketing actions there is a need to understand how people react to communication messages that are sent. Since the introduction of General Data Protection Regulation, the requirements for personal data usage have increased and people are able to effect the way their personal information is used by companies. For instance people have the possibility to unsubscribe from communication that is sent, this is called Opt-Out and can be viewed as churning from communication channels. When a customer Opt-Out the organisation loses the opportunity to send personalised marketing to that individual which in turn result in lost revenue. The aim with this thesis is to investigate the Opt-Out phenomena and build a model that is able to predict the risk of losing a customer from the communication channels. The risk of losing a customer is measured as the estimated probability that a specic individual will Opt-Out in the near future. To predict future Opt-Outs the project uses machine learning algorithms on aggregated communication and customer data. Of the algorithms that were tested the best and most stable performance was achieved by an Extreme Gradient Boosting algorithm that used simulated variables. The performance of the model is best described by an AUC score of 0.71 and a lift score of 2.21, with an adjusted threshold on data two months into the future from when the model was trained. With a model that uses simulated variables the computational cost goes up. However, the increase in performance is signicant and it can be concluded that the choice to include information about specic communications is considered relevant for the outcome of the predictions. A boosted method such as the Extreme Gradient Boosting algorithm generates stable results which lead to a longer time between model retraining sessions.

APA, Harvard, Vancouver, ISO, and other styles

18

BALLANTE, ELENA. "Statistical and Machine Learning models for Neurosciences." Doctoral thesis, Università degli studi di Pavia, 2021. http://hdl.handle.net/11571/1447634.

Full text

Abstract:

This thesis addresses several problems encountered in the field of statistical and machine learning methods for data analysis in neurosciences. The thesis is divided into three parts. The first part of the thesis is related to classification tree models. In the research field of polarization measures, a new polarization measure is defined. The function is incorporated in the decision tree algorithm as a splitting function in order to tackle some weaknesses of classical impurity measures. The new algorithm is called Polarized Classification Tree model. The model is tested on simulated and real data sets and compared with decision tree models where the classical impurity measures are deployed. In the second part of the thesis a new index for assessing and selecting the best model in a classification task when the target variable is ordinal is developed. The index proposed is compared to the traditional measures on simulated data sets and it is applied in a real case study related to Attenuated Psychosis Syndrome. The third part covers the topic of smoothing methods for quaternion time series data in the context of motion data classification. Different proper methods to smoothing time series in quaternion algebra are reviewed and a new method is proposed. The new method is compared with a method proposed in the literature in terms of classification performances on a real data set and five data sets obtained introducing different degrees of noise. The results confirmed the hypothesis made on the basis of the theoretical information available from the two methods, i.e. the logarithm is smoother and generally provides better results than the existing method in terms of classification performances.

APA, Harvard, Vancouver, ISO, and other styles

19

GUIDOTTI, DARIO. "Verification and Repair of Machine Learning Models." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1082694.

Full text

Abstract:

In these last few years, machine learning (ML) has gained incredible traction in the Artificial Intelligence community, and ML models have found successful applications in many different domains across computer science. However, it is hard to provide any formal guarantee on the behavior of ML models, and therefore their reliability is still in doubt, especially concerning their deployment in safety and security-critical applications. Verification and repair emerged as promising solutions to address some of these problems. In this dissertation, we present our contributions to these two lines of research: in particular, we focus on verifying and repairing machine-learned controllers, leveraging learning techniques to enhance the verification and repair of neural networks, and developing novel tools and algorithms for verifying neural networks. Part of our research is made available in the library pyNeVer, which provides capabilities for training, verification, and management of neural networks.

APA, Harvard, Vancouver, ISO, and other styles

20

Markou, Markos N. "Models of novelty detection based on machine learning." Thesis, University of Exeter, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.426165.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Shepherd, T. "Dynamical models and machine learning for supervised segmentation." Thesis, University College London (University of London), 2009. http://discovery.ucl.ac.uk/18729/.

Full text

Abstract:

This thesis is concerned with the problem of how to outline regions of interest in medical images, when the boundaries are weak or ambiguous and the region shapes are irregular. The focus on machine learning and interactivity leads to a common theme of the need to balance conflicting requirements. First, any machine learning method must strike a balance between how much it can learn and how well it generalises. Second, interactive methods must balance minimal user demand with maximal user control. To address the problem of weak boundaries,methods of supervised texture classification are investigated that do not use explicit texture features. These methods enable prior knowledge about the image to benefit any segmentation framework. A chosen dynamic contour model, based on probabilistic boundary tracking, combines these image priors with efficient modes of interaction. We show the benefits of the texture classifiers over intensity and gradient-based image models, in both classification and boundary extraction. To address the problem of irregular region shape, we devise a new type of statistical shape model (SSM) that does not use explicit boundary features or assume high-level similarity between region shapes. First, the models are used for shape discrimination, to constrain any segmentation framework by way of regularisation. Second, the SSMs are used for shape generation, allowing probabilistic segmentation frameworks to draw shapes from a prior distribution. The generative models also include novel methods to constrain shape generation according to information from both the image and user interactions. The shape models are first evaluated in terms of discrimination capability, and shown to out-perform other shape descriptors. Experiments also show that the shape models can benefit a standard type of segmentation algorithm by providing shape regularisers. We finally show how to exploit the shape models in supervised segmentation frameworks, and evaluate their benefits in user trials.

APA, Harvard, Vancouver, ISO, and other styles

22

Liu, Xiaoyang. "Machine Learning Models in Fullerene/Metallofullerene Chromatography Studies." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/93737.

Full text

Abstract:

Machine learning methods are now extensively applied in various scientific research areas to make models. Unlike regular models, machine learning based models use a data-driven approach. Machine learning algorithms can learn knowledge that are hard to be recognized, from available data. The data-driven approaches enhance the role of algorithms and computers and then accelerate the computation using alternative views. In this thesis, we explore the possibility of applying machine learning models in the prediction of chromatographic retention behaviors. Chromatographic separation is a key technique for the discovery and analysis of fullerenes. In previous studies, differential equation models have achieved great success in predictions of chromatographic retentions. However, most of the differential equation models require experimental measurements or theoretical computations for many parameters, which are not easy to obtain. Fullerenes/metallofullerenes are rigid and spherical molecules with only carbon atoms, which makes the predictions of chromatographic retention behaviors as well as other properties much simpler than other flexible molecules that have more variations on conformations. In this thesis, I propose the polarizability of a fullerene molecule is able to be estimated directly from the structures. Structural motifs are used to simplify the model and the models with motifs provide satisfying predictions. The data set contains 31947 isomers and their polarizability data and is split into a training set with 90% data points and a complementary testing set. In addition, a second testing set of large fullerene isomers is also prepared and it is used to testing whether a model can be trained by small fullerenes and then gives ideal predictions on large fullerenes.
Machine learning models are capable to be applied in a wide range of areas, such as scientific research. In this thesis, machine learning models are applied to predict chromatography behaviors of fullerenes based on the molecular structures. Chromatography is a common technique for mixture separations, and the separation is because of the difference of interactions between molecules and a stationary phase. In real experiments, a mixture usually contains a large family of different compounds and it requires lots of work and resources to figure out the target compound. Therefore, models are extremely import for studies of chromatography. Traditional models are built based on physics rules, and involves several parameters. The physics parameters are measured by experiments or theoretically computed. However, both of them are time consuming and not easy to be conducted. For fullerenes, in my previous studies, it has been shown that the chromatography model can be simplified and only one parameter, polarizability, is required. A machine learning approach is introduced to enhance the model by predicting the molecular polarizabilities of fullerenes based on structures. The structure of a fullerene is represented by several local structures. Several types of machine learning models are built and tested on our data set and the result shows neural network gives the best predictions.

APA, Harvard, Vancouver, ISO, and other styles

23

Gosch, Aron. "Exploration of 5G Traffic Models using Machine Learning." Thesis, Linköpings universitet, Databas och informationsteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-168160.

Full text

Abstract:

The Internet is a major communication tool that handles massive information exchanges, sees a rapidly increasing usage, and offers an increasingly wide variety of services. In addition to these trends, the services themselves have highly varying quality of service (QoS), requirements and the network providers must take into account the frequent releases of new network standards like 5G. This has resulted in a significant need for new theoretical models that can capture different network traffic characteristics. Such models are important both for understanding the existing traffic in networks, and to generate better synthetic traffic workloads that can be used to evaluate future generations of network solutions using realistic workload patterns under a broad range of assumptions and based on how the popularity of existing and future application classes may change over time. To better meet these changes, new flexible methods are required. In this thesis, a new framework aimed towards analyzing large quantities of traffic data is developed and used to discover key characteristics of application behavior for IP network traffic. Traffic models are created by breaking down IP log traffic data into different abstraction layers with descriptive values. The aggregated statistics are then clustered using the K-means algorithm, which results in groups with closely related behaviors. Lastly, the model is evaluated with cluster analysis and three different machine learning algorithms to classify the network behavior of traffic flows. From the analysis framework a set of observed traffic models with distinct behaviors are derived that may be used as building blocks for traffic simulations in the future. Based on the framework we have seen that machine learning achieve high performance on the classification of network traffic, with a Multilayer Perceptron getting the best results. Furthermore, the study has produced a set of ten traffic models that have been demonstrated to be able to reconstruct traffic for various network entities.

Due to COVID-19 the presentation was performed over ZOOM.

APA, Harvard, Vancouver, ISO, and other styles

24

Awaysheh, Abdullah Mamdouh. "Data Standardization and Machine Learning Models for Histopathology." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/85040.

Full text

Abstract:

Machine learning can provide insight and support for a variety of decisions. In some areas of medicine, decision-support models are capable of assisting healthcare practitioners in making accurate diagnoses. In this work we explored the application of these techniques to distinguish between two diseases in veterinary medicine; inflammatory bowel disease (IBD) and alimentary lymphoma (ALA). Both disorders are common gastrointestinal (GI) diseases in humans and animals that share very similar clinical and pathological outcomes. Because of these similarities, distinguishing between these two diseases can sometimes be challenging. In order to identify patterns that may help with this differentiation, we retrospectively mined medical records from dogs and cats with histopathologically diagnosed GI diseases. Since the pathology report is the key conveyer of this information in the medical records, our first study focused on its information structure. Other groups have had a similar interest. In 2008, to help insure consistent reporting, the World Small Animal Veterinary Association (WSAVA) GI International Standardization Group proposed standards for recording histopathological findings (HF) from GI biopsy samples. In our work, we extend WSAVA efforts and propose an information model (composed of information structure and terminology mapped to the Systematized Nomenclature of Medicine - Clinical Terms) to be used when recording histopathological diagnoses (HDX, one or more HF from one or more tissues). Next, our aim was to identify free-text HF not currently expressed in the WSAVA format that may provide evidence for distinguishing between IBD and ALA in cats. As part of this work, we hypothesized that WSAVA-based structured reports would have higher classification accuracy of GI disorders in comparison to use of unstructured free-text format. We trained machine learning models in 60 structured, and independently, 60 unstructured reports. Results show that unstructured information-based models using two machine learning algorithms achieved higher accuracy in predicting the diagnosis when compared to the structured information-based models, and some novel free-text features were identified for possible inclusion in the WSAVA-reports. In our third study, we tested the use of machine learning algorithms to differentiate between IBD and ALA using complete blood count and serum chemistry data. Three models (using naïve Bayes, neural networks, and C4.5 decision trees) were trained and tested on laboratory results for 40 Normal, 40 IBD, and 40 ALA cats. Diagnostic models achieved classification sensitivity ranging between 63% and 71% with naïve Bayes and neural networks being superior. These models can provide another non-invasive diagnostic tool to assist with differentiating between IBD and ALA, and between diseased and non-diseased cats. We believe that relying on our information model for histopathological reporting can lead to a more complete, consistent, and computable knowledgebase in which machine learning algorithms can more efficiently identify these and other disease patterns.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

25

Aryasomayajula, Naga Srinivasa Baradwaj. "Machine Learning Models for Categorizing Privacy Policy Text." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535633397362514.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

IGUIDER, WALID. "Machine Learning Models for Sports Remote Coaching Platforms." Doctoral thesis, Università degli Studi di Cagliari, 2022. http://hdl.handle.net/11584/326530.

Full text

Abstract:

Offering timely support to users in eCoaching systems is a crucial factor to keep them engaged. However, coaches usually follow many users, so it is hard to prioritize those they should interact with first. Timeliness is especially needed when health implications might be the consequence of a lack of support. Thanks to the data provided by U4FIT (an eCoaching platform for runners we will describe in Chapter 1) and the rise of high-performance computing, Artificial Intelligence can turn such challenges into unparalleled opportunities. One of its sub-fields, namely Machine Learning, enables machines to receive data and learn for themselves without being programmed with rules. Bringing this intelligent support to the coaching domain has many advantages, such as reducing coaches’ workload and fostering sportspeople to keep their exercise routine. This thesis’s main focus consists of the design, implementation, and evaluation of Machine Learning models in the context of online coaching platforms. On the one hand, our goal is to provide coaches with dashboards that summarize the training behavior of the sportspeople they follow and with a ranked list of the sportspeople according to the support they need to interact with them timely. On the other hand, we want to guarantee a fair exposure in the ranking to ensure that sportspeople of different genres have equal opportunities to get supported. Past research in this field often relied on statistical processes hardly applicable at a large scale. Our studies explore opportunities and challenges introduced by Machine Learning for the above goals, a relevant and timely topic in literature. Extensive experiments support our work, revealing a clear opportunity to combine human and machine sensing for researchers interested in online coaching. Our findings illustrate the feasibility of designing, assessing, and deploying Machine Learning models for workout quality prediction and sportspeople dropout prevention, in addition to the design and implementation of dashboards providing trainers with actionable knowledge about the sportspeople they follow. Our results provide guidelines on model motivation, model design, data collection, and analysis techniques concerning the applicable scenarios above. Researchers can use our findings to improve data collection on eCoaching platforms, reduce bias in rankings, increase model effectiveness, and increase the reliability of their models, among others.

APA, Harvard, Vancouver, ISO, and other styles

27

Rado, Omesaad A. M. "Contributions to evaluation of machine learning models. Applicability domain of classification models." Thesis, University of Bradford, 2019. http://hdl.handle.net/10454/18447.

Full text

Abstract:

Artificial intelligence (AI) and machine learning (ML) present some application opportunities and challenges that can be framed as learning problems. The performance of machine learning models depends on algorithms and the data. Moreover, learning algorithms create a model of reality through learning and testing with data processes, and their performance shows an agreement degree of their assumed model with reality. ML algorithms have been successfully used in numerous classification problems. With the developing popularity of using ML models for many purposes in different domains, the validation of such predictive models is currently required more formally. Traditionally, there are many studies related to model evaluation, robustness, reliability, and the quality of the data and the data-driven models. However, those studies do not consider the concept of the applicability domain (AD) yet. The issue is that the AD is not often well defined, or it is not defined at all in many fields. This work investigates the robustness of ML classification models from the applicability domain perspective. A standard definition of applicability domain regards the spaces in which the model provides results with specific reliability. The main aim of this study is to investigate the connection between the applicability domain approach and the classification model performance. We are examining the usefulness of assessing the AD for the classification model, i.e. reliability, reuse, robustness of classifiers. The work is implemented using three approaches, and these approaches are conducted in three various attempts: firstly, assessing the applicability domain for the classification model; secondly, investigating the robustness of the classification model based on the applicability domain approach; thirdly, selecting an optimal model using Pareto optimality. The experiments in this work are illustrated by considering different machine learning algorithms for binary and multi-class classifications for healthcare datasets from public benchmark data repositories. In the first approach, the decision trees algorithm (DT) is used for the classification of data in the classification stage. The feature selection method is applied to choose features for classification. The obtained classifiers are used in the third approach for selection of models using Pareto optimality. The second approach is implemented using three steps; namely, building classification model; generating synthetic data; and evaluating the obtained results. The results obtained from the study provide an understanding of how the proposed approach can help to define the model’s robustness and the applicability domain, for providing reliable outputs. These approaches open opportunities for classification data and model management. The proposed algorithms are implemented through a set of experiments on classification accuracy of instances, which fall in the domain of the model. For the first approach, by considering all the features, the highest accuracy obtained is 0.98, with thresholds average of 0.34 for Breast cancer dataset. After applying recursive feature elimination (RFE) method, the accuracy is 0.96% with 0.27 thresholds average. For the robustness of the classification model based on the applicability domain approach, the minimum accuracy is 0.62% for Indian Liver Patient data at r=0.10, and the maximum accuracy is 0.99% for Thyroid dataset at r=0.10. For the selection of an optimal model using Pareto optimality, the optimally selected classifier gives the accuracy of 0.94% with 0.35 thresholds average. This research investigates critical aspects of the applicability domain as related to the robustness of classification ML algorithms. However, the performance of machine learning techniques depends on the degree of reliable predictions of the model. In the literature, the robustness of the ML model can be defined as the ability of the model to provide the testing error close to the training error. Moreover, the properties can describe the stability of the model performance when being tested on the new datasets. Concluding, this thesis introduced the concept of applicability domain for classifiers and tested the use of this concept with some case studies on health-related public benchmark datasets.
Ministry of Higher Education in Libya

APA, Harvard, Vancouver, ISO, and other styles

28

Vantzelfde, Nathan Hans. "Prognostic models for mesothelioma : variable selection and machine learning." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/33370.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.
Includes bibliographical references (leaves 103-107).
Malignant pleural mesothelioma is a rare and lethal form of cancer affecting the external lining of the lungs. Extrapleural pneumonectomy (EPP), which involves the removal of the affected lung, is one of the few treatments that has been shown to have some effectiveness in treatment of the disease [39], but this procedure carries with it a high risk of mortality and morbidity [8]. This paper is concerned with building models using gene expression levels to predict patient survival following EPP; these models could potentially be used to guide patient treatment. A study by Gordon et al built a predictor based on ratios of gene expression levels that was 88% accurate on the set of 29 independent test samples, in terms of classifying whether or not the patients survived shorter or longer than the median survival [15]. These results were recreated both on the original data set used by Gordon et al and on a newer data set which contained the same samples but was generated using newer software. The predictors were evaluated using N-fold cross validation. In addition, other methods of variable selection and machine learning were investigated to build different types of predictive models. These analyses used a random training set from the newer data set. These models were evaluated using N-fold cross validation and the best of each of the four main types of models -
(cont.) decision trees, logistic regression, artificial neural networks, and support vector machines - were tested using a small set of samples excluded from the training set. Of these four models, the neural network with eight hidden neurons and weight decay regularization performed the best, achieving a zero cross validation error rate and, on the test set, 71% accuracy, an ROC area of .67 and a logrank p value of .219. The support vector machine model with linear kernel also had zero cross validation error and, on the test set, a 71% accuracy and an ROC area of .67 but had a higher logrank p value of .515. These both had a lower cross validation error than the ratio-based predictors of Gordon et al, which had an N-fold cross validation error rate of 35%; however, these results may not be comparable because the neural network and support vector machine used a different training set than the Gordon et al study. Regression analysis was also performed; the best neural network model was incorrect by an average of 4.6 months in the six test samples. The method of variable selection based on the signal-to-noise ratio of genes originally used by Golub et al proved more effective when used on the randomly generated training set than the method involving Student's t tests and fold change used by Gordon et al. Ultimately, however, these models will need to be evaluated using a large independent test.
by Nathan Hans Vantzelfde.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

29

Ebbesson, Markus. "Mail Volume Forecasting an Evaluation of Machine Learning Models." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-301333.

Full text

Abstract:

This study applies machine learning models to mail volumes with the goal of making sufficiently accurate forecasts to minimise the problem of under- and overstaffing at a mail operating company. A most suitable model appraisal in the context is found by evaluating input features and three different models, Auto Regression (AR), Random Forest (RF) and Neural Network (NN) (Multilayer Perceptron (MLP)). The results provide exceedingly improved forecasting accuracies compared to the model that is currently in use. The RF model is recommended as the most practically applicable model for the company, although the RF and NN models provide similar accuracy. This study serves as an initiative since the field lacks previous research in producing mail volume forecasts with machine learning. The outcomes are predicted to be applicable for mail operators all over Sweden and the World.

APA, Harvard, Vancouver, ISO, and other styles

30

Wissel, Benjamin D. "Generalizability of Electronic Health Record-Based Machine Learning Models." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627659161796896.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Pirgul, Khalid, and Jonathan Svensson. "Veriﬁcation of Powertrain Simulation Models Using Machine Learning Methods." Thesis, Linköpings universitet, Fordonssystem, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166290.

Full text

Abstract:

This thesis is providing an insight into the verification of a quasi-static simulation model based on the estimation of fuel consumption using machine learning methods. Traditional verification using real test data is not always available. Therefore, a methodology consisting of verification analysis based on estimation methods was developed together with an improving process of a quasi-static simulation model. The modelling of the simulation model mainly consists of designing and implementing a gear selection strategy together with the gearbox itself for a dual clutch transmission dedicated to hybrid application. The purpose of the simulation model is to replicate the fuel consumption behaviour of vehicle data provided from performed tests. To verify the simulation results, a so-called ranking model is developed. The ranking model estimates a fuel consumption reference for each time step of the WLTC homologation drive cycle using multiple linear regression. The results of the simulation model are verified, and a scoring system is used to indicate the performance of the simulation model, based on the correlation between estimated- and simulated data of the fuel consumption. The results show that multiple linear regression can be an appropriate approach to use as verification of simulation models. The normalised cross-correlation power is also examined and turns out to be a useful measure for correlation be-tween signals including a lag. The developed ranking model is a fast first step of evaluating a new vehicle configuration concept.

APA, Harvard, Vancouver, ISO, and other styles

32

Elf, Sebastian, and Christopher Öqvist. "Comparison of supervised machine learning models forpredicting TV-ratings." Thesis, KTH, Hälsoinformatik och logistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278054.

Full text

Abstract:

Abstract Manual prediction of TV-ratings to use for program and advertisement placement can be costly if they are wrong, as well as time-consuming. This thesis evaluates different supervised machine learning models to see if the process of predicting TV-ratings can be automated with better accuracy than the manual process. The results show that of the two tested supervised machine learning models, Random Forest and Support Vector Regression, Random Forest was the better model. Random Forest was better on both measurements, mean absolute error and root mean squared error, used to compare the models. The conclusion is that Random Forest, evaluated with the dataset and methods used, are not accurate enough to replace the manual process. Even though this is the case, it could still potentially be used as part of the manual process to ease the workload of the employees. Keywords Machine learning, supervised learning, TV-rating, Support Vector Regression, Random Forest.
SammanfattningAtt manuellt förutsäga tittarsiffor för program- och annonsplacering kan vara kostsamt och tidskrävande om de är fel. Denna rapport utvärderar olika modeller som utnyttjar övervakad maskininlärning för att se om processen för att förutsäga tittarsiffror kan automatiseras med bättre noggrannhet än den manuella processen. Resultaten visar att av de två testade övervakade modellerna för maskininlärning, Random Forest och Support Vector Regression, var Random Forest den bättre modellen. Random Forest var bättre med båda de två mätningsmetoder, genomsnittligt absolut fel och kvadratiskt medelvärde fel, som används för att jämföra modellerna. Slutsatsen är att Random Forest, utvärderad med de data och de metoderna som används, inte är tillräckligt exakt för att ersätta den manuella processen. Även om detta är fallet, kan den fortfarande potentiellt användas som en del av den manuella processen för att underlätta de anställdas arbetsbelastning.Nyckelord Maskininlärning, övervakad inlärning, tittarsiffror, Support Vector Regression, Random Forest.

APA, Harvard, Vancouver, ISO, and other styles

33

Lanka, Venkata Raghava Ravi Teja Lanka. "VEHICLE RESPONSE PREDICTION USING PHYSICAL AND MACHINE LEARNING MODELS." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1511891682062084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Hugo, Linsey Sledge. "A Comparison of Machine Learning Models Predicting Student Employment." Ohio University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1544127100472053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Snelson, Edward Lloyd. "Flexible and efficient Gaussian process models for machine learning." Thesis, University College London (University of London), 2007. http://discovery.ucl.ac.uk/1445855/.

Full text

Abstract:

Gaussian process (GP) models are widely used to perform Bayesian nonlinear regression and classification tasks that are central to many machine learning problems. A GP is nonparametric, meaning that the complexity of the model grows as more data points are received. Another attractive feature is the behaviour of the error bars. They naturally grow in regions away from training data where we have high uncertainty about the interpolating function. In their standard form GPs have several limitations, which can be divided into two broad categories: computational difficulties for large data sets, and restrictive modelling assumptions for complex data sets. This thesis addresses various aspects of both of these problems. The training cost for a GP has 0(N3) complexity, where N is the number of training data points. This is due to an inversion of the N x N covariance matrix. In this thesis we develop several new techniques to reduce this complexity to 0(NM2), where M is a user chosen number much smaller than N. The sparse approximation we use is based on a set of M 'pseudo-inputs' which are optimised together with hyperparameters at training time. We develop a further approximation based on clustering inputs that can be seen as a mixture of local and global approximations. Standard GPs assume a uniform noise variance. We use our sparse approximation described above as a way of relaxing this assumption. By making a modification of the sparse covariance function, we can model input dependent noise. To handle high dimensional data sets we use supervised linear dimensionality reduction. As another extension of the standard GP, we relax the Gaussianity assumption of the process by learning a nonlinear transformation of the output space. All these techniques further increase the applicability of GPs to real complex data sets. We present empirical comparisons of our algorithms with various competing techniques, and suggest problem dependent strategies to follow in practice.

APA, Harvard, Vancouver, ISO, and other styles

36

Zeng, Haoyang Ph D. Massachusetts Institute of Technology. "Machine learning models for functional genomics and therapeutic design." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122689.

Full text

Abstract:

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 213-230).
Due to the limited size of training data available, machine learning models for biology have remained rudimentary and inaccurate despite the significant advance in machine learning research. With the recent advent of high-throughput sequencing technology, an exponentially growing number of genomic and proteomic datasets have been generated. These large-scale datasets admit the training of high-capacity machine learning models to characterize sophisticated features and produce accurate predictions on unseen examples. In this thesis, we attempt to develop advanced machine learning models for functional genomics and therapeutics design, two areas with ample data deposited in public databases and tremendous clinical implications. The shared theme of these models is to learn how the composition of a biological sequence encodes a functional phenotype and then leverage such knowledge to provide insight for target discovery and therapeutic design.
First, we design three machine learning models that predict transcription factor binding and DNA methylation, two fundamental epigenetic phenotypes closely tied to gene regulation, from DNA sequence alone. We show that these epigenetic phenotypes can be well predicted from the sequence context. Moreover, the predicted change in phenotype between the reference and alternate allele of a genetic variant accurately reflect its functional impact and improves the identification of regulatory variants causal for complex diseases. Second, we devise two machine learning models that improve the prediction of peptides displayed by the major histocompatibility complex (MHC) on the cell surface. Computational modeling of peptide-display by MHC is central in the design of peptide-based therapeutics.
Our first machine learning model introduces the capacity to quantify uncertainty in the computational prediction and proposes a new metric for peptide prioritization that reduces false positives in high-affinity peptide design. The second model improves the state-of-the-art performance in MHC-ligand prediction by employing a deep language model to learn the sequence determinants for auxiliary processes in MHC-ligand selection, such as proteasome cleavage, that are omitted by existing methods due to the lack of labeled data. Third, we develop machine learning frameworks to model the enrichment of an antibody sequence in phage-panning experiments against a target antigen. We show that antibodies with low specificity can be reduced by a computational procedure using machine learning models trained for multiple targets. Moreover, machine learning can help to design novel antibody sequences with improved affinity.
by Haoyang Zeng
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

37

Olubeko, Olasubomi O. "Machine learning models for screening and diagnosis of infections." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123039.

Full text

Abstract:

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 71-74).
Millions of people around the globe die or are severely burdened every year at the hands of infections. These infections can occur in wounds on the surface of the body, often after surgery. They also occur inside the body as a result of hazardous contact with infectious pathogens. Many of the victims of infections reside in developing countries and have little access to proper diagnostic resources. As a result, a large portion of these infection victims go without diagnosis until the effects of the infection are severely life-threatening. My research group has focused on developing tools to aid in disease screening for patients in developing areas over the past seven years. For this thesis project, I developed a Logistic Regression model that screens for infections in surgical site wounds using features extracted from visible light images of the wounds. The extracted features convey information about the texture and color of the wound in the LAB color space.
This model was able to achieve nearly perfect classification results on a testing set of 143 patients who were part of a clinical study conducted on C-section patients at clinical facilities in rural Rwanda. Given the outstanding results of this model, our group is looking to incorporate it in a mobile screening application for surgical site infections that is currently being developed. I also built a framework for extracting features to be used in diagnosing infectious pulmonary diseases from thermal images of patients' faces. The extracted features capture information about temperature statistics in different regions of the face. This framework was tested on a small group of patients who participated in a study being conducted by our partners at the NIH. To test the framework, I used the features it extracted from each image as input for a Logistic Regression classifier that predicted whether or not the image subject had an infectious pulmonary disease.
This model achieved an average accuracy of 87.10% and AUC of 0.8125 on a testing set of 32 thermal facial images. These results seem motivating as a preliminary assessment of the power of the extracted thermal features. We plan on expanding the framework to utilize the features with more advanced models and larger datasets once the workers in the study have been able to screen more patients. Finally, I conducted an experiment analyzing gender and socioeconomic bias that may be present in previous models used by our group to screen patients for pulmonary diseases (COPD, asthma, and AR). The experiment observed the effects of training a model on a set of patients that is demographically skewed towards a majority group on the model's testing performance on patients of all groups (majority, minority, and all patients).
This experiment uncovered no significant biases in a model trained and evaluated on datasets of patients screened in previous and current studies conducted by partners of our group. These results were positive, but our group is still interested in finding additional ways to ensure that data collected for our research does not encode unwanted biases against members of any demographic groups that our tools may be utilized by.
by Olasubomi O. Olubeko.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

38

Macis, Ambra. "Statistical Models and Machine Learning for Survival Data Analysis." Doctoral thesis, Università degli studi di Brescia, 2023. https://hdl.handle.net/11379/568945.

Full text

Abstract:

L'argomento principale di questa tesi è l'analisi della sopravvivenza, un insieme di metodi utilizzati negli studi longitudinali in cui l'interesse non è solo nel verificarsi (o meno) di un particolare evento, ma anche nel tempo necessario per osservarlo. Negli anni sono stati inizialmente proposti dei modelli statistici e, in seguito, sono stati introdotti anche metodi di machine learning per affrontare studi di analisi di sopravvivenza. La prima parte del lavoro fornisce un'introduzione ai concetti di base dell'analisi di sopravvivenza e un'ampia rassegna della letteratura esistente. Nello specifico, particolare attenzione è stata posta sui principali modelli statistici (non parametrici, semiparametrici e parametrici) e, tra i metodi di machine learning, sugli alberi e sulle random forests di sopravvivenza. Per questi metodi sono state descritte le principali proposte introdotte negli ultimi decenni. Nella seconda parte della tesi sono invece stati riportati i miei contributi di ricerca. Questi lavori si sono concentrati principalmente su due obiettivi: (1) la razionalizzazione in un protocollo unificato dell'approccio computazionale, che ad oggi è basato su diversi pacchetti esistenti con poca documentazione, molti punti ancora oscuri e anche alcuni bug, e (2) l'applicazione di metodi di analisi dei dati di sopravvivenza in un contesto insolito in cui, per quanto ne sappiamo, questo approccio non era mai stato utilizzato. Nello specifico il primo contributo è consistito nella scrittura di un tutorial volto a permettere a coloro che sono interessati di utilizzare questi metodi, facendo ordine tra i molti pacchetti esistenti e risolvendo i molteplici problemi computazionali presenti. Esso affronta i principali passi da seguire quando si vuole condurre uno studio di simulazione, con particolare attenzione a: (i) simulazione dei dati di sopravvivenza, (ii) adattamento del modello e (iii) valutazione della performance. Il secondo contributo è invece basato sull'applicazione di metodi di analisi di sopravvivenza, sia modelli statistici che algoritmi di machine learning, per analizzare le prestazioni offensive dei giocatori della National Basketball Association (NBA). In particolare, è stata effettuata una procedura di selezione delle variabili per determinare le principali variabili associate alla probabilità di superare un determinato numero di punti fatti durante la parte di stagione successiva all'All-Stars game e il tempo necessario per farlo. Concludendo, questa tesi si propone di porre le basi per lo sviluppo di un framework unificato in grado di armonizzare gli approcci frammentati esistenti e privo di errori computazionali. Inoltre, i risultati di questa tesi suggeriscono che un approccio di analisi di sopravvivenza può essere esteso anche a nuovi contesti.
The main topic of this thesis is survival analysis, a collection of methods used in longitudinal studies in which the interest is not only in the occurrence (or not) of a particular event, but also in the time needed for observing it. Over the years, firstly statistical models and then machine learning methods have been proposed to address studies of survival analysis. The first part of the work provides an introduction to the basic concepts of survival analysis and an extensive review of the existing literature. In particular, the focus has been set on the main statistical models (nonparametric, semiparametric and parametric) and, among machine learning methods, on survival trees and random survival forests. For these methods the main proposals introduced during the last decades have been described. In the second part of the thesis, instead, my research contributions have been reported. These works mainly focused on two aims: (1) the rationalization into a unified protocol of the computational approach, which nowadays is based on several existing packages with few documentation, several still obscure points and also some bugs, and (2) the application of survival data analysis methods in an unusual context where, to our best knowledge, this approach had never been used. In particular, the first contribution consisted in the writing of a tutorial aimed to enable the interested users to approach these methods, making order among the many existing algorithms and packages and providing solutions to the several related computational issues. It dealt with the main steps to follow when a simulation study is carried out, paying attention to: (i) survival data simulation, (ii) model fitting and (iii) performance assessment. The second contribution was based on the application of survival analysis methods, both statistical models and machine learning algorithms, for analyzing the offensive performance of the National Basketball Association (NBA) players. In particular, variable selection has been performed for determining the main variables associated to the probability of exceeding a given amount of scored points during the post All-Stars game season segment and the time needed for doing it. Concluding, this thesis proposes to lay the ground for the development of a unified framework able to harmonize the existing fragmented approaches and without computational issues. Moreover, the findings of this thesis suggest that a survival analysis approach can be extended also to new contexts.

APA, Harvard, Vancouver, ISO, and other styles

39

MARZIALI, ANDREA. "Machine learning models applied to energy time-series forecasting." Doctoral thesis, Università degli studi di Pavia, 2020. http://hdl.handle.net/11571/1326207.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

BARDELLI, CHIARA. "Machine Learning and Statistical models in real world applications." Doctoral thesis, Università degli studi di Pavia, 2021. http://hdl.handle.net/11571/1447635.

Full text

Abstract:

Machine Learning and Statistical models are nowadays widely used in different fields of application thanks to their flexibility and adaptation to specific type of data and problem domain. Each part of this thesis presents a domain of application and proposes a specific approach to solve the problem described. A Bayesian framework is applied in the First and Third part of this thesis, while a Machine Learning approach is preferred in the Second part. The first part of this thesis proposes an algorithm for a new ensemble decision tree procedure based on Proper Bayesian bootstrap. The introduction of synthetic data generated from a prior distribution makes the prediction output more stable in terms of variance component of the Mean Square Error, with more evident results in case of low sample size and high dimensional problems. In the second part we describe a methodological proposal to automate business accounting procedures and integrate the prediction output of a Machine Learning classifier into a software for account officers. All the models are tested on real datasets provided by Datev, a company which develops software for account officers. In the third part we apply Bayesian survival analysis to a Covid-19 dataset to analyse specific quantities which describe the evolution of the epidemic. The estimation of epidemiological quantities using statistical models can be very helpful to address the nonidentifiability problem of the compartmental epidemiological models and leads to more stable prediction of trend of the epidemic.

APA, Harvard, Vancouver, ISO, and other styles

41

Darwaish, Asim. "Adversary-aware machine learning models for malware detection systems." Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7283.

Full text

Abstract:

La popularisation des smartphones et leur caractère indispensable les rendent aujourd'hui indéniables. Leur croissance exponentielle est également à l'origine de l'apparition de nombreux logiciels malveillants et fait trembler le prospère écosystème mobile. Parmi tous les systèmes d'exploitation des smartphones, Android est le plus ciblé par les auteurs de logiciels malveillants en raison de sa popularité, de sa disponibilité en tant que logiciel libre, et de sa capacité intrinsèque à accéder aux ressources internes. Les approches basées sur l'apprentissage automatique ont été déployées avec succès pour combattre les logiciels malveillants polymorphes et évolutifs. Au fur et à mesure que le classificateur devient populaire et largement adopté, l'intérêt d'échapper au classificateur augmente également. Les chercheurs et les adversaires se livrent à une course sans fin pour renforcer le système de détection des logiciels malveillants androïd et y échapper. Afin de lutter contre ces logiciels malveillants et de contrer les attaques adverses, nous proposons dans cette thèse un système de détection de logiciels malveillants android basé sur le codage d'images, un système qui a prouvé sa robustesse contre diverses attaques adverses. La plateforme proposée construit d'abord le système de détection des logiciels malveillants android en transformant intelligemment le fichier Android Application Packaging (APK) en une image RGB légère et en entraînant un réseau neuronal convolutif (CNN) pour la détection des logiciels malveillants et la classification des familles. Notre nouvelle méthode de transformation génère des modèles pour les APK bénins et malveillants plus faciles à classifier en images de couleur. Le système de détection ainsi conçu donne une excellente précision de 99,37% avec un Taux de Faux Négatifs (FNR) de 0,8% et un Taux de Faux Positifs (FPR) de 0,39% pour les anciennes et les nouvelles variantes de logiciels malveillants. Dans la deuxième phase, nous avons évalué la robustesse de notre système de détection de logiciels malveillants android basé sur l'image. Pour valider son efficacité contre les attaques adverses, nous avons créé trois nouveaux modèles d'attaques. Notre évaluation révèle que les systèmes de détection de logiciels malveillants basés sur l'apprentissage les plus récents sont faciles à contourner, avec un taux d'évasion de plus de 50 %. Cependant, le système que nous avons proposé construit un mécanisme robuste contre les perturbations adverses en utilisant son espace continu intrinsèque obtenu après la transformation intelligente des fichiers Dex et Manifest, ce qui rend le système de détection difficile à contourner
The exhilarating proliferation of smartphones and their indispensability to human life is inevitable. The exponential growth is also triggering widespread malware and stumbling the prosperous mobile ecosystem. Among all handheld devices, Android is the most targeted hive for malware authors due to its popularity, open-source availability, and intrinsic infirmity to access internal resources. Machine learning-based approaches have been successfully deployed to combat evolving and polymorphic malware campaigns. As the classifier becomes popular and widely adopted, the incentive to evade the classifier also increases. Researchers and adversaries are in a never-ending race to strengthen and evade the android malware detection system. To combat malware campaigns and counter adversarial attacks, we propose a robust image-based android malware detection system that has proven its robustness against various adversarial attacks. The proposed platform first constructs the android malware detection system by intelligently transforming the Android Application Packaging (APK) file into a lightweight RGB image and training a convolutional neural network (CNN) for malware detection and family classification. Our novel transformation method generates evident patterns for benign and malware APKs in color images, making the classification easier. The detection system yielded an excellent accuracy of 99.37% with a False Negative Rate (FNR) of 0.8% and a False Positive Rate (FPR) of 0.39% for legacy and new malware variants. In the second phase, we evaluate the robustness of our secured image-based android malware detection system. To validate its hardness and effectiveness against evasion, we have crafted three novel adversarial attack models. Our thorough evaluation reveals that state-of-the-art learning-based malware detection systems are easy to evade, with more than a 50% evasion rate. However, our proposed system builds a secure mechanism against adversarial perturbations using its intrinsic continuous space obtained after the intelligent transformation of Dex and Manifest files which makes the detection system strenuous to bypass

APA, Harvard, Vancouver, ISO, and other styles

42

Parekh, Jayneel. "A Flexible Framework for Interpretable Machine Learning : application to image and audio classification." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT032.

Full text

Abstract:

Les systèmes d'apprentissage automatique, et en particulier les réseaux de neurones, ont rapidement développé leur capacité à résoudre des problèmes d'apprentissage complexes. Par conséquent, ils sont intégrés dans la société avec une influence de plus en plus grande sur tous les niveaux de l'expérience humaine. Cela a entraîné la nécessité d'acquérir des informations compréhensibles par l'homme dans leur processus de prise de décision pour s'assurer que les décisions soient prises de manière éthique et fiable. L'étude et le développement de méthodes capables de générer de telles informations constituent de manière générale le domaine de l'apprentissage automatique interprétable.Cette thèse vise à développer un nouveau cadre pour aborder deux problématiques majeures dans ce domaine, l'interprétabilité post-hoc et par conception. L'interprétabilité post-hoc conçoit des méthodes pour analyser les décisions d'un modèle prédictif pré-entraîné, tandis que l'interprétabilité par conception vise à apprendre un modèle unique capable à la fois de prédiction et d'interprétation. Pour ce faire, nous étendons la formulation traditionnelle de l'apprentissage supervisé pour inclure l'interprétation en tant que tâche supplémentaire en plus de la prédiction, chacune étant traitée par des modèles distincts, mais liés, un prédicteur et un interpréteur. Fondamentalement, l'interpréteur dépend du prédicteur à travers ses couches cachées et utilise un dictionnaire de concepts comme représentation pour l'interprétation avec la capacité de générer des interprétations locales et globales.Le cadre est instancié séparément pour résoudre les problèmes d'interprétation dans le contexte de la classification d'images et de sons. Les deux systèmes ont fait l'objet d'une évaluation approfondie de leurs interprétations sur de multiples ensembles de données publics. Dans les deux cas, nous démontrons des performances de prédiction élevées, ainsi qu'une haute fidélité des interprétations. Bien qu'ils adhèrent à la même structure sous-jacente, les deux systèmes sont distinctement conçus pour l'interprétation. Le système d'interprétabilité des images fait avancer le protocole de découverte des concepts appris pour une meilleure compréhension, laquelle est évaluée qualitativement. De plus, il inclut un nouveau critère pour rendre les interprétations plus concises. Le système d'interprétabilité audio est, quant à lui, conçu avec une nouvelle représentation basée sur une factorisation matricielle non-négative pour faciliter les interprétations écoutables, tout en modélisant les objets audio composant une scène
Machine learning systems and specially neural networks, have rapidly grown in their ability to address complex learning problems. Consequently, they are being integrated into society with an ever-rising influence on all levels of human experience. This has resulted in a need to gain human-understandable insights in their decision making process to ensure the decisions are being made ethically and reliably. The study and development of methods which can generate such insightsbroadly constitutes the field of interpretable machine learning. This thesis aims to develop a novel framework that can tackle two major problem settings in this field, post-hoc and by-design interpretation. Posthoc interpretability devises methods to interpret decisionsof a pre-trained predictive model, while by-design interpretability targets to learn a single model capable of both prediction and interpretation. To this end, we extend the traditional supervised learning formulation to include interpretation as an additional task besides prediction,each addressed by separate but related models, a predictor and an interpreter. Crucially, the interpreter is dependent on the predictor through its hidden layers and utilizes a dictionary of concepts as its representation for interpretation with the capacity to generate local and globalinterpretations. The framework is separately instantiated to address interpretability problems in the context of image and audio classification. Both systems are extensively evaluated for their interpretations on multiple publicly available datasets. We demonstrate high predictiveperformance and fidelity of interpretations in both cases. Despite adhering to the same underlying structure the two systems are designed differently for interpretations.The image interpretability system advances the pipeline for discovering learnt concepts for improvedunderstandability that is qualitatively evaluated. The audio interpretability system instead is designed with a novel representation based on non-negative matrix factorization to facilitate listenable interpretations whilst modeling audio objects composing a scene

APA, Harvard, Vancouver, ISO, and other styles

43

Mariet, Zelda Elaine. "Learning with generalized negative dependence : probabilistic models of diversity for machine learning." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122739.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 139-150).
This thesis establishes negative dependence as a powerful and computationally efficient framework to analyze machine learning problems that require a theoretical model of diversification. Examples of such problems include experimental design and model compression: subset-selection problems that require carefully balancing the quality of each selected element with the diversity of the subset as a whole. Negative dependence, which models the behavior of "repelling" random variables, provides a rich mathematical framework for the analysis of such problems. Leveraging negative dependence theory for machine learning requires (a) scalable sampling and learning algorithms for negatively dependent measures, and (b) negatively dependent measures able to model the specific diversity requirements that arise in machine learning. These problems are the focus of this thesis.
The first part of this thesis develops scalable sampling and learning algorithms for determinantal point processes (DPPs), popular negatively dependent measures with many applications to machine learning. For scalable sampling, we introduce a theoretically-motivated generative deep neural network for DPP-like samples over arbitrary ground sets. To address the learning problem, we show that algorithms for maximum likelihood estimation (MLE) for DPps are drastically sped up with Kronecker kernels, and that MLE can be further enriched by negative samples. The second part of this thesis leverages negative dependence for core problems in machine learning. We begin by deriving a generalized form of volume sampling (GVS) based on elementary symmetric polynomials, and prove that the induced measures exhibit strong negative dependence properties.
We then show that classical forms of optimal experimental design can be cast as optimization problems based on GVS, for which we derive randomized and greedy algorithms to obtain the associated designs. Finally, we introduce exponentiated strongly Rayleigh measures, which allow for simple tuning of the strength of repulsive forces between similar items while still enjoying fast sampling algorithms. The great flexibility of exponentiated strongly Rayleigh measures makes them an ideal tool for machine learning problems that benefit from negative dependence theory.
by Zelda E. Lawson Mariet.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

44

Goodman, Genghis. "A Machine Learning Approach to Artificial Floorplan Generation." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/89.

Full text

Abstract:

The process of designing a floorplan is highly iterative and requires extensive human labor. Currently, there are a number of computer programs that aid humans in floorplan design. These programs, however, are limited in their inability to fully automate the creative process. Such automation would allow a professional to quickly generate many possible floorplan solutions, greatly expediting the process. However, automating this creative process is very difficult because of the many implicit and explicit rules a model must learn in order create viable floorplans. In this paper, we propose a method of floorplan generation using two machine learning models: a sequential model that generates rooms within the floorplan, and a graph-based model that finds adjacencies between generated rooms. Each of these models can be altered such that they are each capable of producing a floorplan independently; however, we find that the combination of these models outperforms each of its pieces, as well as a statistic-based approach.

APA, Harvard, Vancouver, ISO, and other styles

45

Tahkola, M. (Mikko). "Developing dynamic machine learning surrogate models of physics-based industrial process simulation models." Master's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201906042313.

Full text

Abstract:

Abstract. Dynamic physics-based models of industrial processes can be computationally heavy which prevents using them in some applications, e.g. in process operator training. Suitability of machine learning in creating surrogate models of a physics-based unit operation models was studied in this research. The main motivation for this was to find out if machine learning model can be accurate enough to replace the corresponding physics-based components in dynamic modelling and simulation software Apros® which is developed by VTT Technical Research Centre of Finland Ltd and Fortum. This study is part of COCOP project, which receive funding from EU, and INTENS project that is Business Finland funded. The research work was divided into a literature study and an experimental part. In the literature study, the steps of modelling with data-driven methods were studied and artificial neural network architectures suitable for dynamic modelling were investigated. Based on that, four neural network architectures were chosen for the case studies. In the first case study, linear and nonlinear autoregressive models with exogenous inputs (ARX and NARX respectively) were used in modelling dynamic behaviour of a water tank process build in Apros®. In the second case study, also Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) were considered and compared with the previously mentioned ARX and NARX models. The workflow from selecting the input and output variables for the machine learning model and generating the datasets in Apros® to implement the machine learning models back to Apros® was defined. Keras is an open source neural network library running on Python that was utilised in the model generation framework which was developed as a part of this study. Keras library is a very popular library that allow fast experimenting. The framework make use of random hyperparameter search and each model is tested on a validation dataset in dynamic manner, i.e. in multi-step-ahead configuration, during the optimisation. The best models based in terms of average normalised root mean squared error (NRMSE) is selected for further testing. The results of the case studies show that accurate multi-step-ahead models can be built using recurrent artificial neural networks. In the first case study, the linear ARX model achieved slightly better NRMSE value than the nonlinear one, but the accuracy of both models was on a very good level with the average NRMSE being lower than 0.1 %. The generalisation ability of the models was tested using multiple datasets and the models proved to generalise well. In the second case study, there were more difference between the models’ accuracies. This was an expected result as the studied process contains nonlinearities and thus the linear ARX model performed worse in predicting some output variables than the nonlinear ones. On the other hand, ARX model performed better with some other output variables. However, also in the second case study the model NRMSE values were on good level, being 1.94–3.60 % on testing dataset. Although the workflow to implement machine learning models in Apros® using its Python binding was defined, the actual implementation need more work. Experimenting with Keras neural network models in Apros® was noticed to slow down the simulation even though the model was fast when testing it outside of Apros®. The Python binding in Apros® do not seem to cause overhead to the calculation process which is why further investigating is needed. It is obvious that the machine learning model must be very accurate if it is to be implemented in Apros® because it needs to be able interact with the physics-based model. The actual accuracy requirement that Apros® sets should be also studied to know if and in which direction the framework made for this study needs to be developed.Dynaamisten surrogaattimallien kehittäminen koneoppimismenetelmillä teollisuusprosessien fysiikkapohjaisista simulaatiomalleista. Tiivistelmä. Teollisuusprosessien toimintaa jäljittelevät dynaamiset fysiikkapohjaiset simulaatiomallit voivat laajuudesta tai yksityiskohtien määrästä johtuen olla laskennallisesti raskaita. Tämä voi rajoittaa simulaatiomallin käyttöä esimerkiksi prosessioperaattorien koulutuksessa ja hidastaa simulaattorin avulla tehtävää prosessien optimointia. Tässä tutkimuksessa selvitettiin koneoppimismenetelmillä luotujen mallien soveltuvuutta fysiikkapohjaisten yksikköoperaatiomallien surrogaattimallinnukseen. Fysiikkapohjaiset mallit on luotu teollisuusprosessien dynaamiseen mallinnukseen ja simulointiin kehitetyllä Apros®-ohjelmistolla, jota kehittää Teknologian tutkimuskeskus VTT Oy ja Fortum. Työ on osa COCOP-projektia, joka saa rahoitusta EU:lta, ja INTENS-projektia, jota rahoittaa Business Finland. Työ on jaettu kirjallisuusselvitykseen ja kahteen kokeelliseen case-tutkimukseen. Kirjallisuusosiossa selvitettiin datapohjaisen mallinnuksen eri vaiheet ja tutkittiin dynaamiseen mallinnukseen soveltuvia neuroverkkorakenteita. Tämän perusteella valittiin neljä neuroverkkoarkkitehtuuria case-tutkimuksiin. Ensimmäisessä case-tutkimuksessa selvitettiin lineaarisen ja epälineaarisen autoregressive model with exogenous inputs (ARX ja NARX) -mallin soveltuvuutta pinnankorkeuden säädöllä varustetun vesisäiliömallin dynaamisen käyttäytymisen mallintamiseen. Toisessa case-tutkimuksessa tarkasteltiin edellä mainittujen mallityyppien lisäksi Long Short-Term Memory (LSTM) ja Gated Recurrent Unit (GRU) -verkkojen soveltuvuutta power-to-gas prosessin metanointireaktorin dynaamiseen mallinnukseen. Työssä selvitettiin surrogaattimallinnuksen vaiheet korvattavien yksikköoperaatiomallien ja siihen liittyvien muuttujien valinnasta datan generointiin ja koneoppimismallien implementointiin Aprosiin. Koneoppimismallien rakentamiseen tehtiin osana työtä Python-sovellus, joka hyödyntää Keras Python-kirjastoa neuroverkkomallien rakennuksessa. Keras on suosittu kirjasto, joka mahdollistaa nopean neuroverkkomallien kehitysprosessin. Työssä tehty sovellus hyödyntää neuroverkkomallien hyperparametrien optimoinnissa satunnaista hakua. Jokaisen optimoinnin aikana luodun mallin tarkkuutta dynaamisessa simuloinnissa mitataan erillistä aineistoa käyttäen. Jokaisen mallityypin paras malli valitaan NRMSE-arvon perusteella seuraaviin testeihin. Case-tutkimuksen tuloksien perusteella neuroverkoilla voidaan saavuttaa korkea tarkkuus dynaamisessa simuloinnissa. Ensimmäisessä case-tutkimuksessa lineaarinen ARX-malli oli hieman epälineaarista tarkempi, mutta molempien mallityyppien tarkkuus oli hyvä (NRMSE alle 0.1 %). Mallien yleistyskykyä mitattiin simuloimalla usealla aineistolla, joiden perusteella yleistyskyky oli hyvällä tasolla. Toisessa case-tutkimuksessa vastemuuttujien tarkkuuden välillä oli eroja lineaarisen ja epälineaaristen mallityyppien välillä. Tämä oli odotettu tulos, sillä joidenkin mallinnettujen vastemuuttujien käyttäytyminen on epälineaarista ja näin ollen lineaarinen ARX-malli suoriutui niiden mallintamisesta epälineaarisia malleja huonommin. Toisaalta lineaarinen ARX-malli oli tarkempi joidenkin vastemuuttujien mallinnuksessa. Kaiken kaikkiaan mallinnus onnistui hyvin myös toisessa case-tutkimuksessa, koska käytetyillä mallityypeillä saavutettiin 1.94–3.60 % NRMSE-arvo testidatalla simuloitaessa. Koneoppimismallit saatiin sisällytettyä Apros-malliin käyttäen Python-ominaisuutta, mutta prosessi vaatii lisäselvitystä, jotta mallit saadaan toimimaan yhdessä. Testien perusteella Keras-neuroverkkojen käyttäminen näytti hidastavan simulaatiota, vaikka neuroverkkomalli oli nopea Aprosin ulkopuolella. Aprosin Python-ominaisuus ei myöskään näytä itsessään aiheuttavan hitautta, jonka takia asiaa tulisi selvittää mallien implementoinnin mahdollistamiseksi. Koneoppimismallin tulee olla hyvin tarkka toimiakseen vuorovaikutuksessa fysiikkapohjaisen mallin kanssa. Jatkotutkimuksen ja Python-sovelluksen kehittämisen kannalta on tärkeää selvittää mikä on Aprosin koneoppimismalleille asettama tarkkuusvaatimus.

APA, Harvard, Vancouver, ISO, and other styles

46

Bhat, Sooraj. "Syntactic foundations for machine learning." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/47700.

Full text

Abstract:

Machine learning has risen in importance across science, engineering, and business in recent years. Domain experts have begun to understand how their data analysis problems can be solved in a principled and efficient manner using methods from machine learning, with its simultaneous focus on statistical and computational concerns. Moreover, the data in many of these application domains has exploded in availability and scale, further underscoring the need for algorithms which find patterns and trends quickly and correctly. However, most people actually analyzing data today operate far from the expert level. Available statistical libraries and even textbooks contain only a finite sample of the possibilities afforded by the underlying mathematical principles. Ideally, practitioners should be able to do what machine learning experts can do--employ the fundamental principles to experiment with the practically infinite number of possible customized statistical models as well as alternative algorithms for solving them, including advanced techniques for handling massive datasets. This would lead to more accurate models, the ability in some cases to analyze data that was previously intractable, and, if the experimentation can be greatly accelerated, huge gains in human productivity. Fixing this state of affairs involves mechanizing and automating these statistical and algorithmic principles. This task has received little attention because we lack a suitable syntactic representation that is capable of specifying machine learning problems and solutions, so there is no way to encode the principles in question, which are themselves a mapping between problem and solution. This work focuses on providing the foundational layer for enabling this vision, with the thesis that such a representation is possible. We demonstrate the thesis by defining a syntactic representation of machine learning that is expressive, promotes correctness, and enables the mechanization of a wide variety of useful solution principles.

APA, Harvard, Vancouver, ISO, and other styles

47

Lundström, Love, and Oscar Öhman. "Machine Learning in credit risk : Evaluation of supervised machine learning models predicting credit risk in the financial sector." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-164101.

Full text

Abstract:

When banks lend money to another party they face a risk that the borrower will not fulfill its obligation towards the bank. This risk is called credit risk and it’s the largest risk banks faces. According to the Basel accord banks need to have a certain amount of capital requirements to protect themselves towards future financial crisis. This amount is calculated for each loan with an attached risk-weighted asset, RWA. The main parameters in RWA is probability of default and loss given default. Banks are today allowed to use their own internal models to calculate these parameters. Thus hold capital with no gained interest is a great cost, banks seek to find tools to better predict probability of default to lower the capital requirement. Machine learning and supervised algorithms such as Logistic regression, Neural network, Decision tree and Random Forest can be used to decide credit risk. By training algorithms on historical data with known results the parameter probability of default (PD) can be determined with a higher certainty degree compared to traditional models, leading to a lower capital requirement. On the given data set in this article Logistic regression seems to be the algorithm with highest accuracy of classifying customer into right category. However, it classifies a lot of people as false positive meaning the model thinks a customer will honour its obligation but in fact the customer defaults. Doing this comes with a great cost for the banks. Through implementing a cost function to minimize this error, we found that the Neural network has the lowest false positive rate and will therefore be the model that is best suited for this specific classification task.
När banker lånar ut pengar till en annan part uppstår en risk i att låntagaren inte uppfyller sitt antagande mot banken. Denna risk kallas för kredit risk och är den största risken en bank står inför. Enligt Basel föreskrifterna måste en bank avsätta en viss summa kapital för varje lån de ger ut för att på så sätt skydda sig emot framtida finansiella kriser. Denna summa beräknas fram utifrån varje enskilt lån med tillhörande risk-vikt, RWA. De huvudsakliga parametrarna i RWA är sannolikheten att en kund ej kan betala tillbaka lånet samt summan som banken då förlorar. Idag kan banker använda sig av interna modeller för att estimera dessa parametrar. Då bundet kapital medför stora kostnader för banker, försöker de sträva efter att hitta bättre verktyg för att uppskatta sannolikheten att en kund fallerar för att på så sätt minska deras kapitalkrav. Därför har nu banker börjat titta på möjligheten att använda sig av maskininlärningsalgoritmer för att estimera dessa parametrar. Maskininlärningsalgoritmer såsom Logistisk regression, Neurala nätverk, Beslutsträd och Random forest, kan användas för att bestämma kreditrisk. Genom att träna algoritmer på historisk data med kända resultat kan parametern, chansen att en kund ej betalar tillbaka lånet (PD), bestämmas med en högre säkerhet än traditionella metoder. På den givna datan som denna uppsats bygger på visar det sig att Logistisk regression är den algoritm med högst träffsäkerhet att klassificera en kund till rätt kategori. Däremot klassifiserar denna algoritm många kunder som falsk positiv vilket betyder att den predikterar att många kunder kommer betala tillbaka sina lån men i själva verket inte betalar tillbaka lånet. Att göra detta medför en stor kostnad för bankerna. Genom att istället utvärdera modellerna med hjälp av att införa en kostnadsfunktion för att minska detta fel finner vi att Neurala nätverk har den lägsta falsk positiv ration och kommer därmed vara den model som är bäst lämpad att utföra just denna specifika klassifierings uppgift.

APA, Harvard, Vancouver, ISO, and other styles

48

Cahill, Jaspar. "Machine learning techniques to improve software quality." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/41730/1/Jaspar_Cahill_Thesis.pdf.

Full text

Abstract:

A significant proportion of the cost of software development is due to software testing and maintenance. This is in part the result of the inevitable imperfections due to human error, lack of quality during the design and coding of software, and the increasing need to reduce faults to improve customer satisfaction in a competitive marketplace. Given the cost and importance of removing errors improvements in fault detection and removal can be of significant benefit. The earlier in the development process faults can be found, the less it costs to correct them and the less likely other faults are to develop. This research aims to make the testing process more efficient and effective by identifying those software modules most likely to contain faults, allowing testing efforts to be carefully targeted. This is done with the use of machine learning algorithms which use examples of fault prone and not fault prone modules to develop predictive models of quality. In order to learn the numerical mapping between module and classification, a module is represented in terms of software metrics. A difficulty in this sort of problem is sourcing software engineering data of adequate quality. In this work, data is obtained from two sources, the NASA Metrics Data Program, and the open source Eclipse project. Feature selection before learning is applied, and in this area a number of different feature selection methods are applied to find which work best. Two machine learning algorithms are applied to the data - Naive Bayes and the Support Vector Machine - and predictive results are compared to those of previous efforts and found to be superior on selected data sets and comparable on others. In addition, a new classification method is proposed, Rank Sum, in which a ranking abstraction is laid over bin densities for each class, and a classification is determined based on the sum of ranks over features. A novel extension of this method is also described based on an observed polarising of points by class when rank sum is applied to training data to convert it into 2D rank sum space. SVM is applied to this transformed data to produce models the parameters of which can be set according to trade-off curves to obtain a particular performance trade-off.

APA, Harvard, Vancouver, ISO, and other styles

49

Garcia, Gomez David. "Exploration of customer churn routes using machine learning probabilistic models." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/144660.

Full text

Abstract:

The ongoing processes of globalization and deregulation are changing the competitive framework in the majority of economic sectors. The appearance of new competitors and technologies entails a sharp increase in competition and a growing preoccupation among service providing companies with creating stronger bonds with customers. Many of these companies are shifting resources away from the goal of capturing new customers and are instead focusing on retaining existing ones. In this context, anticipating the customer¿s intention to abandon, a phenomenon also known as churn, and facilitating the launch of retention-focused actions represent clear elements of competitive advantage. Data mining, as applied to market surveyed information, can provide assistance to churn management processes. In this thesis, we mine real market data for churn analysis, placing a strong emphasis on the applicability and interpretability of the results. Statistical Machine Learning models for simultaneous data clustering and visualization lay the foundations for the analyses, which yield an interpretable segmentation of the surveyed markets. To achieve interpretability, much attention is paid to the intuitive visualization of the experimental results. Given that the modelling techniques under consideration are nonlinear in nature, this represents a non-trivial challenge. Newly developed techniques for data visualization in nonlinear latent models are presented. They are inspired in geographical representation methods and suited to both static and dynamic data representation.

APA, Harvard, Vancouver, ISO, and other styles

50

Rosenbaum, Lars [Verfasser]. "Interpretable Machine Learning Models for Mining Chemical Databases / Lars Rosenbaum." München : Verlag Dr. Hut, 2014. http://d-nb.info/1047036266/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Explainability of machine learning models'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles