Tesis: "Graphes embeddings"

1

Damay, Gabriel. "Dynamic Decision Trees and Community-based Graph Embeddings : towards Interpretable Machine Learning". Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAT047.

Texto completo

Resumen

L'apprentissage automatique est le domaine des sciences informatiques dont le but est de créer des modèles et des solutions à partir de données sans savoir exactement les instructions qui dirigent intrinsèquement ces modèles. Ce domaine a obtenu des résultats impressionnants mais il est l'objet le sujet d'inquiétudes en raison notamment de l'impossibilité de comprendre et d'auditer les modèles qu'il produit. L'apprentissage automatique interprétable propose une solution à ces inquiétudes en créant des modèles qui sont interprétables de façon inhérante. Cette thèse contribue à l'apprentissage automatique interprétable de deux façons.Tout d'abord, nous étudions les arbres de décision. Il s'agit d'un groupe de méthodes d'apprentissage automatique très connu et qui est interprétable par la façon même dont il est conçu. Cependant, les données réelles sont souvent dynamiques et peu d'algorithmes existent pour maintenir un arbre de décision quand des données peuvent à la fois être ajoutées et supprimées de l'ensemble d'entrainement. Nous proposons un nouvel algorithme nommé FuDyADT pour résoudre ce problème.Ensuite, quand les données sont représentées sous forme de graphe, une technique d'apprentissage automatique très commune, nommée "embedding", consiste à projeter les données sur un espace vectoriel. Ce type de méthodes est cependant non-interprétable en général. Nous proposons un nouvel algorithme d'embedding appelé Parfaite, qui est basé sur la factorisation de la matrice de PageRank personnalisé. Cet algorithme est conçu pour que ses résultats soient interprétables.Nous étudions chacun de ces algorithmes sur un plan à la fois théorique et expérimental. Nous montrons que FuDyADT est au minimum comparable aux algorithmes à l'état de l'art dans les conditions habituelles, tout en étant également capable de fonctionner dans des contextes inhabituels comme dans le cas où des données sont supprimés ou dans le cas où certaines des données sont numériques. Quant à Parfaite, il produit des dimensions d'embedding qui sont alignées avec les communautés du graphe, et qui sont donc interprétables
Machine Learning is the field of computer science that interests in building models and solutions from data without knowing exactly the set of instructions internal to these models and solutions. This field has achieved great results but is now under scrutiny for the inability to understand or audit its models among other concerns. Interpretable Machine Learning addresses these concerns by building models that are inherently interpretable. This thesis contributes to Interpretable Machine Learning in two ways.First, we study Decision Trees. This is a very popular group of Machine Learning methods for classification problems and it is interpretable by design. However, real world data is often dynamic, but few algorithms can maintain a decision tree when data can be both inserted and deleted from the training set. We propose a new algorithm called FuDyADT to solve this problem.Second, when data are represented as graphs, a very common machine learning technique called "embedding" consists in projecting them onto a vectorial space. This kind of method however is usually not interpretable. We propose a new embedding algorithm called Parfaite based on the factorization of the Personalized PageRank matrix. This algorithm is designed to provide interpretable results.We study both algorithms theoretically and experimentally. We show that FuDyADT is at least comparable to state-of-the-art algorithms in the usual setting, while also being able to handle unusual settings such as deletions of data and numerical features. Parfaite on the other hand produces embedding dimensions that align with the communities of the graph, making the embedding interpretable