Dissertations / Theses on the topic 'Analyse statistique de graphes'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Analyse statistique de graphes.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Guillaume, Jean-Loup. "Analyse statistique et modélisation des grands réseaux d'interactions." Phd thesis, Université Paris-Diderot - Paris VII, 2004. http://tel.archives-ouvertes.fr/tel-00011377.
Full textLa première partie est centrée sur l'analyse des réseaux et fait un point critique sur les réseaux étudiés et les paramètres introduits pour mieux comprendre leur structure. Un certain nombre de ces paramètres sont partagés par la majorité des réseaux étudiés et justifient l'étude de ceux-ci de manière globale.
La seconde partie qui constitue le coeur de cette thèse s'attache à la modélisation des grands réseaux d'interactions, c'est-à-dire la construction de graphes artificiels semblables à ceux rencontrés en pratique. Ceci passe tout d'abord par la présentation des modèles existants puis par l'introduction d'un modèle basé sur certaines propriétés non triviales qui est suffisamment simple pour que l'on puisse l'étudier formellement ses propriétés et malgré tout réaliste.
Enfin, la troisième partie est purement méthodologique. Elle permet de présenter la mise en pratique des parties précédentes et l'apport qui en découle en se basant sur trois cas particuliers : une étude des échanges dans un réseau pair-à-pair, une étude de la robustesse des réseaux aux pannes et aux attaques et enfin un ensemble de simulations visant à estimer la qualité des cartes de l'Internet actuellement utilisées.
Cette thèse met en lumière la nécessité de poursuivre les travaux sur les grands réseaux d'interactions et pointe plusieurs pistes prometteuses, notamment sur l'étude plus fine des réseaux, que ce soit de manière pondérée ou dynamique. Mais aussi sur la nécessité d'étudier de nombreux problèmes liés à la métrologie des réseaux pour réussir à capturer leur structure de manière plus précise.
Zreik, Rawya. "Analyse statistique des réseaux et applications aux sciences humaines." Thesis, Paris 1, 2016. http://www.theses.fr/2016PA01E061/document.
Full textOver the last two decades, network structure analysis has experienced rapid growth with its construction and its intervention in many fields, such as: communication networks, financial transaction networks, gene regulatory networks, disease transmission networks, mobile telephone networks. Social networks are now commonly used to represent the interactions between groups of people; for instance, ourselves, our professional colleagues, our friends and family, are often part of online networks, such as Facebook, Twitter, email. In a network, many factors can exert influence or make analyses easier to understand. Among these, we find two important ones: the time factor, and the network context. The former involves the evolution of connections between nodes over time. The network context can then be characterized by different types of information such as text messages (email, tweets, Facebook, posts, etc.) exchanged between nodes, categorical information on the nodes (age, gender, hobbies, status, etc.), interaction frequencies (e.g., number of emails sent or comments posted), and so on. Taking into consideration these factors can lead to the capture of increasingly complex and hidden information from the data. The aim of this thesis is to define new models for graphs which take into consideration the two factors mentioned above, in order to develop the analysis of network structure and allow extraction of the hidden information from the data. These models aim at clustering the vertices of a network depending on their connection profiles and network structures, which are either static or dynamically evolving. The starting point of this work is the stochastic block model, or SBM. This is a mixture model for graphs which was originally developed in social sciences. It assumes that the vertices of a network are spread over different classes, so that the probability of an edge between two vertices only depends on the classes they belong to
Stoica, Alina-Mihaela. "Analyse de la structure locale des grands réseaux sociaux." Paris 7, 2010. http://www.theses.fr/2010PA077190.
Full textThe main goal of our research was to characterize the individuals connected in a social network by analyzing the local structure of the network. For that, we proposed a method that describes the way a node (corresponding to an individual) is embedded in the network. Our method is related to the analysis of egocentred networks in sociology and to the local approach in the study of complex networks. It can be applied to small networks, to fractions of networks and also to large networks, due to its small complexity. We applied the proposed method to two large social networks, one modeling online activity on MySpace, the other one modeling mobile phone communications. In the first case we were interested in analyzing the online popularity of artists on MySpace. In the second case, we proposed and used a method for clustering nodes that are connected in a similar way to the network. We found that the distribution of mobile phone users into clusters was correlated to other characteristics of the individuals (i. E. Communication intensity and age). Although in this thesis we applied the two methods only to social networks, they can be applied in the same way to any other graph, no matter its origin
Dumoncel, Franck. "Géographie et graphes : une interaction pour exprimer des requëtes spatiales guidée par des adjacences conceptuelles." Caen, 2006. http://www.theses.fr/2006CAEN2007.
Full textHollocou, Alexandre. "Nouvelles approches pour le partitionnement de grands graphes." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE063.
Full textGraphs are ubiquitous in many fields of research ranging from sociology to biology. A graph is a very simple mathematical structure that consists of a set of elements, called nodes, connected to each other by edges. It is yet able to represent complex systems such as protein-protein interaction or scientific collaborations. Graph clustering is a central problem in the analysis of graphs whose objective is to identify dense groups of nodes that are sparsely connected to the rest of the graph. These groups of nodes, called clusters, are fundamental to an in-depth understanding of graph structures. There is no universal definition of what a good cluster is, and different approaches might be best suited for different applications. Whereas most of classic methods focus on finding node partitions, i.e. on coloring graph nodes so that each node has one and only one color, more elaborate approaches are often necessary to model the complex structure of real-life graphs and to address sophisticated applications. In particular, in many cases, we must consider that a given node can belong to more than one cluster. Besides, many real-world systems exhibit multi-scale structures and one much seek for hierarchies of clusters rather than flat clusterings. Furthermore, graphs often evolve over time and are too massive to be handled in one batch so that one must be able to process stream of edges. Finally, in many applications, processing entire graphs is irrelevant or expensive, and it can be more appropriate to recover local clusters in the neighborhood of nodes of interest rather than color all graph nodes. In this work, we study alternative approaches and design novel algorithms to tackle these different problems. The novel methods that we propose to address these different problems are mostly inspired by variants of modularity, a classic measure that accesses the quality of a node partition, and by random walks, stochastic processes whose properties are closely related to the graph structure. We provide analyses that give theoretical guarantees for the different proposed techniques, and endeavour to evaluate these algorithms on real-world datasets and use cases
Bonis, Thomas. "Algorithmes d'apprentissage statistique pour l'analyse géométrique et topologique de données." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS459/document.
Full textIn this thesis, we study data analysis algorithms using random walks on neighborhood graphs, or random geometric graphs. It is known random walks on such graphs approximate continuous objects called diffusion processes. In the first part of this thesis, we use this approximation result to propose a new soft clustering algorithm based on the mode seeking framework. For our algorithm, we want to define clusters using the properties of a diffusion process. Since we do not have access to this continuous process, our algorithm uses a random walk on a random geometric graph instead. After proving the consistency of our algorithm, we evaluate its efficiency on both real and synthetic data. We then deal tackle the issue of the convergence of invariant measures of random walks on random geometric graphs. As these random walks converge to a diffusion process, we can expect their invariant measures to converge to the invariant measure of this diffusion process. Using an approach based on Stein's method, we manage to obtain quantitfy this convergence. Moreover, the method we use is more general and can be used to obtain other results such as convergence rates for the Central Limit Theorem. In the last part of this thesis, we use the concept of persistent homology, a concept of algebraic topology, to improve the pooling step of the bag-of-words approach for 3D shapes
Colin, Igor. "Adaptation des méthodes d’apprentissage aux U-statistiques." Thesis, Paris, ENST, 2016. http://www.theses.fr/2016ENST0070/document.
Full textWith the increasing availability of large amounts of data, computational complexity has become a keystone of many machine learning algorithms. Stochastic optimization algorithms and distributed/decentralized methods have been widely studied over the last decade and provide increased scalability for optimizing an empirical risk that is separable in the data sample. Yet, in a wide range of statistical learning problems, the risk is accurately estimated by U-statistics, i.e., functionals of the training data with low variance that take the form of averages over d-tuples. We first tackle the problem of sampling for the empirical risk minimization problem. We show that empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the learning rate. We establish uniform deviation results and numerical examples show that such approach surpasses more naive subsampling techniques. We then focus on the decentralized estimation topic, where the data sample is distributed over a connected network. We introduce new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds with explicit data and network dependent terms. Finally, we deal with the decentralized optimization of functions that depend on pairs of observations. Similarly to the estimation case, we introduce a method based on concurrent local updates and data propagation. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. Our simulations illustrate the practical interest of our approach
Laporte, Quentin. "Étude morpho-statistique des réseaux sociaux. Application aux collaborations inter-organisationnelles." Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0007.
Full textDecentralised collaborative applications address privacy, availability and security issues related to centralised collaborative platforms. Such applications are based on a peer-to-peer communication paradigm according to which all users are directly connected to one another. Collaborations tend to widen and spread beyond the borders of organisations. Under these circumstances, it is necessary to guarantee to users the control over their data, while keeping collaboration available. To that end, the social network that has built between collaborators may be used as topology. Lack of information on this trusted network leads us to develop an approach to study its morphological properties. In this thesis, we develop and implement an approach to study the social structure of interactions in the context of inter-organisational collaborations. We propose a stochastic approach based on Exponential Random Graph Models (ERGM) and spatial models. We define a formalism that highlights the structure of interactions and integrates the organisational dimension. We propose to use a Bayesian inference method, ABC Shadow, to overcome the issues related to the parameters estimation. This approach is applied to a real case study: the collaborations initiated by researchers in a laboratory. In particular, it highlights the low tendency for a researcher to create collaborative links with other laboratories. We show that this approach can be applied to other kinds of social interactions, such as interactions between pupils of a primary school. Finally, we present a parallelisation strategy of the Gibbs sampler aimed at processing larger graphs in a reasonable time
Bera, Roderic. "L'Adjacence relative. Une Etude contextuelle de l'influence de l'environnement spatial dans l'appréhension de la notion de proximité." Rennes 1, 2004. https://hal.archives-ouvertes.fr/tel-01276691.
Full textMeot, Alain. "Explicitation de contraintes de voisinage en analyse multivariée : applications dans le cadre de problématiques agronomiques." Lyon 1, 1992. http://www.theses.fr/1992LYO10241.
Full textMaignant, Elodie. "Plongements barycentriques pour l'apprentissage géométrique de variétés : application aux formes et graphes." Electronic Thesis or Diss., Université Côte d'Azur, 2023. http://www.theses.fr/2023COAZ4096.
Full textAn MRI image has over 60,000 pixels. The largest known human protein consists of around 30,000 amino acids. We call such data high-dimensional. In practice, most high-dimensional data is high-dimensional only artificially. For example, of all the images that could be randomly generated by coloring 256 x 256 pixels, only a very small subset would resemble an MRI image of a human brain. This is known as the intrinsic dimension of such data. Therefore, learning high-dimensional data is often synonymous with dimensionality reduction. There are numerous methods for reducing the dimension of a dataset, the most recent of which can be classified according to two approaches.A first approach known as manifold learning or non-linear dimensionality reduction is based on the observation that some of the physical laws behind the data we observe are non-linear. In this case, trying to explain the intrinsic dimension of a dataset with a linear model is sometimes unrealistic. Instead, manifold learning methods assume a locally linear model.Moreover, with the emergence of statistical shape analysis, there has been a growing awareness that many types of data are naturally invariant to certain symmetries (rotations, reparametrizations, permutations...). Such properties are directly mirrored in the intrinsic dimension of such data. These invariances cannot be faithfully transcribed by Euclidean geometry. There is therefore a growing interest in modeling such data using finer structures such as Riemannian manifolds. A second recent approach to dimension reduction consists then in generalizing existing methods to non-Euclidean data. This is known as geometric learning.In order to combine both geometric learning and manifold learning, we investigated the method called locally linear embedding, which has the specificity of being based on the notion of barycenter, a notion a priori defined in Euclidean spaces but which generalizes to Riemannian manifolds. In fact, the method called barycentric subspace analysis, which is one of those generalizing principal component analysis to Riemannian manifolds, is based on this notion as well. Here we rephrase both methods under the new notion of barycentric embeddings. Essentially, barycentric embeddings inherit the structure of most linear and non-linear dimension reduction methods, but rely on a (locally) barycentric -- affine -- model rather than a linear one.The core of our work lies in the analysis of these methods, both on a theoretical and practical level. In particular, we address the application of barycentric embeddings to two important examples in geometric learning: shapes and graphs. In addition to practical implementation issues, each of these examples raises its own theoretical questions, mostly related to the geometry of quotient spaces. In particular, we highlight that compared to standard dimension reduction methods in graph analysis, barycentric embeddings stand out for their better interpretability. In parallel with these examples, we characterize the geometry of locally barycentric embeddings, which generalize the projection computed by locally linear embedding. Finally, algorithms for geometric manifold learning, novel in their approach, complete this work
Kokonendji, Célestin Clotaire. "Familles exponentielles naturelles réelles de fonction variance en R Q/ par Célestin Clotaire Kokonendji." Toulouse 3, 1993. http://www.theses.fr/1993TOU30092.
Full textDasse-Hartaut, Sandrine. "Combinatoire des tableaux escalier." Paris 7, 2014. http://www.theses.fr/2014PA077070.
Full textA relatively new combinatorial structure, called staircase tableaux, was introduced in recent work of S. Corteel and L. Williams. Staircase tableaux are a generalisation of permutation tableaux and alternative tableaux. Their study gave a combinatorial formula for the moments of Askey-Wilson polynomials. Staircase tableaux are also related to the asymmetric exclusion process on a one-dimensional lattice with open boundaries (ASEP), an important and heavily studied particle model in statistical mechanics. The study of the generating function of the staircase tableau has given a combinatorial formula for the steady state probability of the ASEP. We use differents approaches to study the staircase tableaux : with a probabilistic approach, we prove the asymptotic normality of some parameters of the staircase tableaux ; with bijective combinatorics, we get the properties of some subsets of staircase tableaux, using for example tree-like tableaux or permutations. Finally, a Markov chain on a subset of staircase tableaux confirms intuitively the formula for the steady state probability without using the matrix ansatz
Lagesse, Claire. "Lire les lignes de la ville : méthodologie de caractérisation des graphes spatiaux." Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCC162.
Full textCities arise from a large set of interactions and components. Amid this diversity, we chose an object which orchestrates the development and use of an urban area : the road network. From its representation with a graph we can build a geographical object called the way, which is multi-scale, making its analysis robust against zoning. We evaluate several indicators, and identify those that give the most relevant and non-redundant information. The way, appears to have unique spatial properties, revealing parallels between global and local analyses. With this methodology, we demonstrate how different road graphs, from various places in the world, show similar properties, and how some of those properties are also present in other networks (biological, hydrographical, etc). After considering the static properties of networks, we analyze how global characterization evolves through time. We define a model of temporal differentiation, where the change in accessibility of each object is highlighted. It is thus possible to have a first estimation of the growth kinematic of the road networks studied. This work culminates with the integration of the way and its associated indicators into a qualitative approach. We show how such analysis, based on the topological and topographical properties of their road networks, allows us to trace back some aspects of the historical and geographical contexts of city formation. Multidisciplinary discussions are synthesized to reveal research applications and future work
Iacopini, Matteo. "Essays on econometric modelling of temporal networks." Thesis, Paris 1, 2018. http://www.theses.fr/2018PA01E058/document.
Full textGraph theory has long been studied in mathematics and probability as a tool for describing dependence between nodes. However, only recently it has been implemented on data, giving birth to the statistical analysis of real networks.The topology of economic and financial networks is remarkably complex: it is generally unobserved, thus requiring adequate inferential procedures for it estimation, moreover not only the nodes, but the structure of dependence itself evolves over time. Statistical and econometric tools for modelling the dynamics of change of the network structure are lacking, despite their increasing requirement in several fields of research. At the same time, with the beginning of the era of “Big data” the size of available datasets is becoming increasingly high and their internal structure is growing in complexity, hampering traditional inferential processes in multiple cases.This thesis aims at contributing to this newborn field of literature which joins probability, economics, physics and sociology by proposing novel statistical and econometric methodologies for the study of the temporal evolution of network structures of medium-high dimension
Jmel, Saïd. "Applications des modèles graphiques au choix de variables et à l'analyse des interactions dans une table de contingence multiple." Toulouse 3, 1992. http://www.theses.fr/1992TOU30091.
Full textKadavankandy, Arun. "L’analyse spectrale des graphes aléatoires et son application au groupement et l’échantillonnage." Thesis, Université Côte d'Azur (ComUE), 2017. http://www.theses.fr/2017AZUR4059/document.
Full textIn this thesis, we study random graphs using tools from Random Matrix Theory and probability to tackle key problems in complex networks and Big Data. First we study graph anomaly detection. Consider an Erdős-Rényi (ER) graph with edge probability q and size n containing a planted subgraph of size m and probability p. We derive a statistical test based on the eigenvalue and eigenvector properties of a suitably defined matrix to detect the planted subgraph. We analyze the distribution of the derived test statistic using Random Matrix Theoretic techniques. Next, we consider subgraph recovery in this model in the presence of side-information. We analyse the effect of side-information on the detectability threshold of Belief Propagation (BP) applied to the above problem. We show that BP correctly recovers the subgraph even with noisy side-information for any positive value of an effective SNR parameter. This is in contrast to BP without side-information which requires the SNR to be above a certain threshold. Finally, we study the asymptotic behaviour of PageRank on a class of undirected random graphs called fast expanders, using Random Matrix Theoretic techniques. We show that PageRank can be approximated for large graph sizes as a convex combination of the normalized degree vector and the personalization vector of the PageRank, when the personalization vector is sufficiently delocalized. Subsequently, we characterize asymptotic PageRank on Stochastic Block Model (SBM) graphs, and show that it contains a correction term that is a function of the community structure
Martinet, Lucie. "Réseaux dynamiques de terrain : caractérisation et propriétés de diffusion en milieu hospitalier." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1010/document.
Full textIn this thesis, we focus on tools whose aim is to extract structural and temporal properties of dynamic networks as well as diffusion characteristics which can occur on these networks. We work on specific data, from the European MOSAR project, including the network of individuals proximity from time to time during 6 months at the Brek-sur-Mer Hospital. The studied network is notable because of its three dimensions constitution : the structural one induced by the distribution of individuals into distinct services, the functional dimension due to the partition of individual into groups of socio-professional categories and the temporal dimension.For each dimension, we used tools well known from the areas of statistical physics as well as graphs theory in order to extract information which enable to describe the network properties. These methods underline the specific structure of the contacts distribution which follows the individuals distribution into services. We also highlight strong links within specific socio-professional categories. Regarding the temporal part, we extract circadian and weekly patterns and quantify the similarities of these activities. We also notice distinct behaviour within patients and staff evolution. In addition, we present tools to compare the network activity within two given periods. To finish, we use simulations techniques to extract diffusion properties of the network to find some clues in order to establish a prevention policy
Blazere, Melanie. "Inférence statistique en grande dimension pour des modèles structurels. Modèles linéaires généralisés parcimonieux, méthode PLS et polynômes orthogonaux et détection de communautés dans des graphes." Thesis, Toulouse, INSA, 2015. http://www.theses.fr/2015ISAT0018/document.
Full textThis thesis falls within the context of high-dimensional data analysis. Nowadays we have access to an increasing amount of information. The major challenge relies on our ability to explore a huge amount of data and to infer their dependency structures.The purpose of this thesis is to study and provide theoretical guarantees to some specific methods that aim at estimating dependency structures for high-dimensional data. The first part of the thesis is devoted to the study of sparse models through Lasso-type methods. In Chapter 1, we present the main results on this topic and then we generalize the Gaussian case to any distribution from the exponential family. The major contribution to this field is presented in Chapter 2 and consists in oracle inequalities for a Group Lasso procedure applied to generalized linear models. These results show that this estimator achieves good performances under some specific conditions on the model. We illustrate this part by considering the case of the Poisson model. The second part concerns linear regression in high dimension but the sparsity assumptions is replaced by a low dimensional structure underlying the data. We focus in particular on the PLS method that attempts to find an optimal decomposition of the predictors given a response. We recall the main idea in Chapter 3. The major contribution to this part consists in a new explicit analytical expression of the dependency structure that links the predictors to the response. The next two chapters illustrate the power of this formula by emphasising new theoretical results for PLS. The third and last part is dedicated to graphs modelling and especially to community detection. After presenting the main trends on this topic, we draw our attention to Spectral Clustering that allows to cluster nodes of a graph with respect to a similarity matrix. In this thesis, we suggest an alternative to this method by considering a $l_1$ penalty. We illustrate this method through simulations
Manouvrier, Jean-François. "Méthode de décomposition pour résoudre des problèmes combinatoires sur les graphes." Compiègne, 1998. http://www.theses.fr/1998COMP1152.
Full textTremblay, Nicolas. "Réseaux et signal : des outils de traitement du signal pour l'analyse des réseaux." Thesis, Lyon, École normale supérieure, 2014. http://www.theses.fr/2014ENSL0938/document.
Full textThis thesis describes new tools specifically designed for the analysis of networks such as social, transportation, neuronal, protein, communication networks... These networks, along with the rapid expansion of electronic, IT and mobile technologies are increasingly monitored and measured. Adapted tools of analysis are therefore very much in demand, which need to be universal, powerful, and precise enough to be able to extract useful information from very different possibly large networks. To this end, a large community of researchers from various disciplines have concentrated their efforts on the analysis of graphs, well define mathematical tools modeling the interconnected structure of networks. Among all the considered directions of research, graph signal processing brings a new and promising vision : a signal is no longer defined on a regular n-dimensional topology, but on a particular topology defined by the graph. To apply these new ideas on the practical problems of network analysis paves the way to an analysis firmly rooted in signal processing theory. It is precisely this frontier between signal processing and network science that we explore throughout this thesis, as shown by two of its major contributions. Firstly, a multiscale version of community detection in networks is proposed, based on the recent definition of graph wavelets. Then, a network-adapted bootstrap method is introduced, that enables statistical estimation based on carefully designed graph resampling schemes
Gaillard, Pierre. "Apprentissage statistique de la connexité d'un nuage de points par modèle génératif : application à l'analyse exploratoire et la classification semi-supervisée." Compiègne, 2008. http://www.theses.fr/2008COMP1767.
Full textIn this work, we propose a statistical model to learn the connectedness of a set of points. This model combine geometrical and statistical approaches by defining a mixture model based on a graph. From this generative graph, we propose and evaluate methods and algorithms to analyse the set of points and to realize semi-supervised learning
Frindel, Carole. "Imagerie par résonance magnétique du tenseur de diffusion (IRM-TD) en imagerie cardiaque humaine : traitements et premi`eres interprétations." Phd thesis, INSA de Lyon, 2009. http://tel.archives-ouvertes.fr/tel-00473031.
Full textHeymann, Sébastien. "Analyse exploratoire de flots de liens pour la détection d'événements." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00994766.
Full textMunch, Mélanie. "Améliorer le raisonnement dans l'incertain en combinant les modèles relationnels probabilistes et la connaissance experte." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASB011.
Full textThis thesis focuses on integrating expert knowledge to enhance reasoning under uncertainty. Our goal is to guide the probabilistic relations’ learning with expert knowledge for domains described by ontologies.To do so we propose to couple knowledge bases (KBs) and an oriented-object extension of Bayesian networks, the probabilistic relational models (PRMs). Our aim is to complement the statistical learning with expert knowledge in order to learn a model as close as possible to the reality and analyze it quantitatively (with probabilistic relations) and qualitatively (with causal discovery). We developped three algorithms throught three distinct approaches, whose main differences lie in their automatisation and the integration (or not) of human expert supervision.The originality of our work is the combination of two broadly opposed philosophies: while the Bayesian approach favors the statistical analysis of the given data in order to reason with it, the ontological approach is based on the modelization of expert knowledge to represent a domain. Combining the strenght of the two allows to improve both the reasoning under uncertainty and the expert knowledge
Mahmoudi, Saïd. "Indexation de formes planes : application à la reconnaissance multi-vues de modèles 3D." Lille 1, 2003. https://pepite-depot.univ-lille.fr/RESTREINT/Th_Num/2003/50376-2003-291.pdf.
Full textMonnin, Pierre. "Matching and mining in knowledge graphs of the Web of data : Applications in pharmacogenomics." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0212.
Full textIn the Web of data, an increasing number of knowledge graphs are concurrently published, edited, and accessed by human and software agents. Their wide adoption makes key the two tasks of matching and mining. First, matching consists in identifying equivalent, more specific, or somewhat similar units within and across knowledge graphs. This task is crucial since concurrent publication and edition may result in coexisting and complementary knowledge graphs. However, this task is challenging because of the inherent heterogeneity of knowledge graphs, e.g., in terms of granularities, vocabularies, and completeness. Motivated by an application in pharmacogenomics, we propose two approaches to match n-ary relationships represented in knowledge graphs: a symbolic rule-based approach and a numeric approach using graph embedding. We experiment on PGxLOD, a knowledge graph that we semi-automatically built by integrating pharmacogenomic relationships from three distinct sources of this domain. Second, mining consists in discovering new and useful knowledge units from knowledge graphs. Their increasing size and combinatorial nature entail scalability issues, which we address in the mining of path patterns. We also propose Concept Annotation, a refinement approach extending Formal Concept Analysis, a mathematical framework that groups entities based on their common attributes. Throughout all our works, we particularly focus on taking advantage of domain knowledge in the form of ontologies that can be associated with knowledge graphs. We show that, when considered, such domain knowledge alleviates heterogeneity and scalability issues in matching and mining approaches
Guigourès, Romain. "Utilisation des modèles de co-clustering pour l'analyse exploratoire des données." Phd thesis, Université Panthéon-Sorbonne - Paris I, 2013. http://tel.archives-ouvertes.fr/tel-00935278.
Full textCoupechoux, Emilie. "Analyse de grands graphes aléatoires." Paris 7, 2012. http://www.theses.fr/2012PA077184.
Full textSeveral kinds of real-world networks can be represented by graphs. Since such networks are very large, their detailed topology is generally unknown, and we model them by large random graphs having the same local statistical properties as the observed networks. An example of such properties is the fact that real-world networks are often highly clustered : if two individuals have a friend in common, they are likely to also be each other's friends. Studying random graph models that are both appropriate and tractable from a mathematical point of view is challenging, that is why we consider several clustered random graph models. The spread of epidemics in random graphs can be used to model several kinds of phenomena in real-world networks, as the spread of diseases, or the diffusion of a new technology. The epidemic model we consider depends on the phenomenon we wish to represent :. An individual can contract a disease by a single contact with any of his friends (such contacts being independent),. But a new technology is likely to be adopted by an individual if many of his friends already have the technology in question. We essentially study these two cases. In each case, one wants to know if a small proportion of the population initially infected (or having the technology in question) can propagate the epidemic to a large part of the population
Lumbreras, Alberto. "Automatic role detection in online forums." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2111/document.
Full textThis thesis addresses the problem of detecting user roles in online discussion forums. A role may be defined as the set of behaviors characteristic of a person or a position. In discussion forums, behaviors are primarily observed through conversations. Hence, we focus our attention on how users discuss. We propose three methods to detect groups of users with similar conversational behaviors.Our first method for the detection of roles is based on conversational structures. Weapply different notions of neighborhood for posts in tree graphs (radius-based, order-based, and time-based) and compare the conversational patterns that they detect as well as the clusters of users with similar conversational patterns.Our second method is based on stochastic models of growth for conversation threads.Building upon these models we propose a method to find groups of users that tend to reply to the same type of posts. We show that, while there are clusters of users with similar replying patterns, there is no strong evidence that these behaviors are predictive of future behaviors |except for some groups of users with extreme behaviors.In out last method, we integrate the type of data used in the two previous methods(feature-based and behavioral or functional-based) and show that we can find clusters using fewer examples. The model exploits the idea that users with similar features have similar behaviors
Corneli, Marco. "Dynamic stochastic block models, clustering and segmentation in dynamic graphs." Thesis, Paris 1, 2017. http://www.theses.fr/2017PA01E012/document.
Full textThis thesis focuses on the statistical analysis of dynamic graphs, both defined in discrete or continuous time. We introduce a new extension of the stochastic block model (SBM) for dynamic graphs. The proposed approach, called dSBM, adopts non homogeneous Poisson processes to model the interaction times between pairs of nodes in dynamic graphs, either in discrete or continuous time. The intensity functions of the processes only depend on the node clusters, in a block modelling perspective. Moreover, all the intensity functions share some regularity properties on hidden time intervals that need to be estimated. A recent estimation algorithm for SBM, based on the greedy maximization of an exact criterion (exact ICL) is adopted for inference and model selection in dSBM. Moreover, an exact algorithm for change point detection in time series, the "pruned exact linear time" (PELT) method is extended to deal with dynamic graph data modelled via dSBM. The approach we propose can be used for change point analysis in graph data. Finally, a further extension of dSBM is developed to analyse dynamic net- works with textual edges (like social networks, for instance). In this context, the graph edges are associated with documents exchanged between the corresponding vertices. The textual content of the documents can provide additional information about the dynamic graph topological structure. The new model we propose is called "dynamic stochastic topic block model" (dSTBM).Graphs are mathematical structures very suitable to model interactions between objects or actors of interest. Several real networks such as communication networks, financial transaction networks, mobile telephone networks and social networks (Facebook, Linkedin, etc.) can be modelled via graphs. When observing a network, the time variable comes into play in two different ways: we can study the time dates at which the interactions occur and/or the interaction time spans. This thesis only focuses on the first time dimension and each interaction is assumed to be instantaneous, for simplicity. Hence, the network evolution is given by the interaction time dates only. In this framework, graphs can be used in two different ways to model networks. Discrete time […] Continuous time […]. In this thesis both these perspectives are adopted, alternatively. We consider new unsupervised methods to cluster the vertices of a graph into groups of homogeneous connection profiles. In this manuscript, the node groups are assumed to be time invariant to avoid possible identifiability issues. Moreover, the approaches that we propose aim to detect structural changes in the way the node clusters interact with each other. The building block of this thesis is the stochastic block model (SBM), a probabilistic approach initially used in social sciences. The standard SBM assumes that the nodes of a graph belong to hidden (disjoint) clusters and that the probability of observing an edge between two nodes only depends on their clusters. Since no further assumption is made on the connection probabilities, SBM is a very flexible model able to detect different network topologies (hubs, stars, communities, etc.)
Venant, Fabienne. "Représentation et calcul dynamique du sens : exploration du lexique adjectival du français." Phd thesis, Ecole des Hautes Etudes en Sciences Sociales (EHESS), 2006. http://tel.archives-ouvertes.fr/tel-00067902.
Full textHamidouche, Mounia. "Analyse spectrale de graphes géométriques aléatoires." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4019.
Full textWe study random geometric graphs (RGGs) to address key problems in complex networks. An RGG is constructed by uniformly distributing n nodes on a torus of dimension d and connecting two nodes if their distance does not exceed a certain threshold. Three regimes for RGGs are of particular interest. The connectivity regime in which the average vertex degree a_n grows logarithmically with n or faster. The dense regime in which a_n is linear with n. The thermodynamic regime in which a_n is a constant. We study the spectrum of RGGs normalized Laplacian (LN) and its regularized version in the three regimes. When d is fixed and n tends to infinity we prove that the limiting spectral distribution (LSD) of LN converges to Dirac distribution at 1 in the connectivity regime. In the thermodynamic regime we propose an approximation for LSD of the regularized NL and we provide an error bound on the approximation. We show that LSD of the regularized LN of an RGG is approximated by LSD of the regularized LN of a deterministic geometric graph (DGG). We study LSD of RGGs adjacency matrix in the connectivity regime. Under some conditions on a_n we show that LSD of DGGs adjacency matrix is a good approximation for LSD of RGGs for n large. We determine the spectral dimension (SD) that characterizes the return time distribution of a random walk on RGGs. We show that SD depends on the eigenvalue density (ED) of the RGG normalized Laplacian in the neighborhood of the minimum eigenvalues. Based on the analytical eigenvalues of the normalized Laplacian we show that ED in a neighborhood of the minimum value follows a power-law tail and we approximate SD of RGGs by d in the thermodynamic regime
Bourien, Jérôme. "Analyse de distributions spatio-temporelles de transitoires dans des signaux vectoriels. Application à la détection-classification d'activités paroxystiques intercritiques dans des observations EEG." Phd thesis, Université Rennes 1, 2003. http://tel.archives-ouvertes.fr/tel-00007178.
Full text1. Détection des AE monovoie. La méthode de détection, qui repose sur une approche heuristique, utilise un banc de filtres en ondelettes pour réhausser la composante pointue des AE (généralement appelée "spike" dans la littérature). La valeur moyenne des statistiques obtenues en sortie de chaque filtre est ensuite analysée avec un algorithme de Page-Hinkley dans le but de détecter des changements abrupts correspondant aux spikes.
2. Fusion des AE. Cette procédure recherche des co-occurrences entre AE monovoie à l'aide d'une fenêtre glissante puis forme des AE multivoies.
3. Extraction des sous-ensembles de voies fréquement et significativement activées lors des AE multivoies (appelés "ensembles d'activation").
4. Evaluation de l'éxistence d'un ordre d'activation temporel reproductible (éventuellement partiel) au sein de chaque ensemble d'activation.
Les méthodes proposées dans chacune des étapes ont tout d'abord été évaluées à l'aide de signaux simulés (étape 1) ou à l'aide de models Markoviens (étapes 2-4). Les résultats montrent que la méthode complète est robuste aux effets des fausses-alarmes. Cette méthode a ensuite été appliquée à des signaux enregistrés chez 8 patients (chacun contenant plusieurs centaines d'AE). Les résultats indiquent une grande reproductibilité des distributions spatio-temporelles des AE et ont permis l'identification de réseaux anatomo-fonctionnels spécifiques.
Panafieu, Elie de. "Combinatoire analytique des graphes, hypergraphes et graphes inhomogènes." Paris 7, 2014. http://www.theses.fr/2014PA077167.
Full textWe investigate two graph-like models: the non-uniform hypergraphs and the inhomogeneous graphs. They are close to the models defined by Darling and Norris (2004) and Sôderberg (2002). We enumerate them and derive structure information before and near the birth of the giant component. The inhomogeneous graph model proves to be a convenient framework for the modeling of several tractable constraint satisfaction problems (CSP), such as the 2-colorability problem, the satisfiability of 2-Xor formulas and of quantified 2-Xor formulas. We link the probability of satisfiability of those problems to the enumeration of inhomogeneous graphs. As an application, proofs of old and new phase transition results are derived in a unified framework. Finally, we derive a new simple proof for the asymptotic number of connected multigraphs with a number of edges proportional to the number of vertices. This result was first derived for simple graphs by Bender, Canfield and McKay (1990). The main tool of this thesis is analytic combinatorics, as defined by Flajolet and Sedgewick in their book (2009)
Zaylaa, Amira. "Analyse et extraction de paramètres de complexité de signaux biomédicaux." Thesis, Tours, 2014. http://www.theses.fr/2014TOUR3315/document.
Full textThe analysis of biomedical time series derived from nonlinear dynamic systems is challenging due to the chaotic nature of these time series. Only few classical parameters can be detected by clinicians to opt the state of patients and fetuses. Though there exist valuable complexity invariants such as multi-fractal parameters, entropies and recurrence plot, they were unsatisfactory in certain cases. To overcome this limitation, we propose in this dissertation new entropy invariants, we contributed to multi-fractal analysis and we developed signal-based (unbiased) recurrence plots based on the dynamic transitions of time series. Principally, we aim to improve the discrimination between healthy and distressed biomedical systems, particularly fetuses by processing the time series using our techniques. These techniques were either validated on Lorenz system, logistic maps or fractional Brownian motions modeling chaotic and random time series. Then the techniques were applied to real fetus heart rate signals recorded in the third trimester of pregnancy. Statistical measures comprising the relative errors, standard deviation, sensitivity, specificity, precision or accuracy were employed to evaluate the performance of detection. Elevated discernment outcomes were realized by the high-order entropy invariants. Multi-fractal analysis using a structure function enhances the detection of medical fetal states. Unbiased cross-determinism invariant amended the discrimination process. The significance of our techniques lies behind their post-processing codes which could build up cutting-edge portable machines offering advanced discrimination and detection of Intrauterine Growth Restriction prior to fetal death. This work was devoted to Fetal Heart Rates but time series generated by alternative nonlinear dynamic systems should be further considered
Dao, Ngoc Bich. "Réduction de dimension de sac de mots visuels grâce à l’analyse formelle de concepts." Thesis, La Rochelle, 2017. http://www.theses.fr/2017LAROS010/document.
Full textIn several scientific fields such as statistics, computer vision and machine learning, redundant and/or irrelevant information reduction in the data description (dimension reduction) is an important step. This process contains two different categories : feature extraction and feature selection, of which feature selection in unsupervised learning is hitherto an open question. In this manuscript, we discussed about feature selection on image datasets using the Formal Concept Analysis (FCA), with focus on lattice structure and lattice theory. The images in a dataset were described as a set of visual words by the bag of visual words model. Two algorithms were proposed in this thesis to select relevant features and they can be used in both unsupervised learning and supervised learning. The first algorithm was the RedAttSansPerte, which based on lattice structure and lattice theory, to ensure its ability to remove redundant features using the precedence graph. The formal definition of precedence graph was given in this thesis. We also demonstrated their properties and the relationship between this graph and the AC-poset. Results from experiments indicated that the RedAttsSansPerte algorithm reduced the size of feature set while maintaining their performance against the evaluation by classification. Secondly, the RedAttsFloue algorithm, an extension of the RedAttsSansPerte algorithm, was also proposed. This extension used the fuzzy precedence graph. The formal definition and the properties of this graph were demonstrated in this manuscript. The RedAttsFloue algorithm removed redundant and irrelevant features while retaining relevant information according to the flexibility threshold of the fuzzy precedence graph. The quality of relevant information was evaluated by the classification. The RedAttsFloue algorithm is suggested to be more robust than the RedAttsSansPerte algorithm in terms of reduction
Kassel, Adrien. "Laplaciens des graphes sur les surfaces et applications à la physique statistique." Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112101.
Full textWe study the determinant of the Laplacian on vector bundles on graphs and use it, combined with discrete complex analysis, to study models of statistical physics. We compute exact lattice constants, construct scaling limits for excursions of the loop-erased random walk on surfaces, and study some Gaussian fields and determinantal processes
Calandriello, Daniele. "Efficient sequential learning in structured and constrained environments." Thesis, Lille 1, 2017. http://www.theses.fr/2017LIL10216/document.
Full textThe main advantage of non-parametric models is that the accuracy of the model (degrees of freedom) adapts to the number of samples. The main drawback is the so-called "curse of kernelization": to learn the model we must first compute a similarity matrix among all samples, which requires quadratic space and time and is unfeasible for large datasets. Nonetheless the underlying effective dimension (effective d.o.f.) of the dataset is often much smaller than its size, and we can replace the dataset with a subset (dictionary) of highly informative samples. Unfortunately, fast data-oblivious selection methods (e.g., uniform sampling) almost always discard useful information, while data-adaptive methods that provably construct an accurate dictionary, such as ridge leverage score (RLS) sampling, have a quadratic time/space cost. In this thesis we introduce a new single-pass streaming RLS sampling approach that sequentially construct the dictionary, where each step compares a new sample only with the current intermediate dictionary and not all past samples. We prove that the size of all intermediate dictionaries scales only with the effective dimension of the dataset, and therefore guarantee a per-step time and space complexity independent from the number of samples. This reduces the overall time required to construct provably accurate dictionaries from quadratic to near-linear, or even logarithmic when parallelized. Finally, for many non-parametric learning problems (e.g., K-PCA, graph SSL, online kernel learning) we we show that we can can use the generated dictionaries to compute approximate solutions in near-linear that are both provably accurate and empirically competitive
Wehbe, Diala. "Simulations and applications of large-scale k-determinantal point processes." Thesis, Lille 1, 2019. http://www.theses.fr/2019LIL1I012/document.
Full textWith the exponentially growing amount of data, sampling remains the most relevant method to learn about populations. Sometimes, larger sample size is needed to generate more precise results and to exclude the possibility of missing key information. The problem lies in the fact that sampling large number may be a principal reason of wasting time.In this thesis, our aim is to build bridges between applications of statistics and k-Determinantal Point Process(k-DPP) which is defined through a matrix kernel. We have proposed different applications for sampling large data sets basing on k-DPP, which is a conditional DPP that models only sets of cardinality k. The goal is to select diverse sets that cover a much greater set of objects in polynomial time. This can be achieved by constructing different Markov chains which have the k-DPPs as their stationary distribution.The first application consists in sampling a subset of species in a phylogenetic tree by avoiding redundancy. By defining the k-DPP via an intersection kernel, the results provide a fast mixing sampler for k-DPP, for which a polynomial bound on the mixing time is presented and depends on the height of the phylogenetic tree.The second application aims to clarify how k-DPPs offer a powerful approach to find a diverse subset of nodes in large connected graph which authorizes getting an outline of different types of information related to the ground set. A polynomial bound on the mixing time of the proposed Markov chain is given where the kernel used here is the Moore-Penrose pseudo-inverse of the normalized Laplacian matrix. The resulting mixing time is attained under certain conditions on the eigenvalues of the Laplacian matrix. The third one purposes to use the fixed cardinality DPP in experimental designs as a tool to study a Latin Hypercube Sampling(LHS) of order n. The key is to propose a DPP kernel that establishes the negative correlations between the selected points and preserve the constraint of the design which is strictly confirmed by the occurrence of each point exactly once in each hyperplane. Then by creating a new Markov chain which has n-DPP as its stationary distribution, we determine the number of steps required to build a LHS with accordance to n-DPP
Jourdan-Marias, Astrid. "Analyse statistique et échantillonage d'expériences simulées." Pau, 2000. http://www.theses.fr/2000PAUU1014.
Full textMahé, Cédric. "Analyse statistique de delais d'evenement correles." Paris 7, 1998. http://www.theses.fr/1998PA077254.
Full textGodard, Emmanuel. "Réécritures de graphes et algorithmique distribuée." Bordeaux 1, 2002. http://www.theses.fr/2002BOR12518.
Full textCigana, John. "Analyse statistique de sensibilité du modèle SANCHO." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ38667.pdf.
Full textCélimène, Fred. "Analyse statistique et économétrique des DOM-TOM." Paris 10, 1985. http://www.theses.fr/1985PA100002.
Full textOlivier, Adelaïde. "Analyse statistique des modèles de croissance-fragmentation." Thesis, Paris 9, 2015. http://www.theses.fr/2015PA090047/document.
Full textThis work is concerned with growth-fragmentation models, implemented for investigating the growth of a population of cells which divide according to an unknown splitting rate, depending on a structuring variable – age and size being the two paradigmatic examples. The mathematical framework includes statistics of processes, nonparametric estimations and analysis of partial differential equations. The three objectives of this work are the following : get a nonparametric estimate of the division rate (as a function of age or size) for different observation schemes (genealogical or continuous) ; to study the transmission of a biological feature from one cell to an other and study the feature of one typical cell ; to compare different populations of cells through their Malthus parameter, which governs the global growth (when introducing variability in the growth rate among cells for instance)
Goulard, Michel. "Champs spatiaux et statistique multidimensionnelle." Grenoble 2 : ANRT, 1988. http://catalogue.bnf.fr/ark:/12148/cb376138909.
Full textMostefaoui, Mustapha. "Analyse des propriétés temporelles des graphes d'événements valués continus." Nantes, 2001. http://www.theses.fr/2001NANT2100.
Full textAlbano, Alice. "Dynamique des graphes de terrain : analyse en temps intrinsèque." Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066260/document.
Full textWe are surrounded by a multitude of interaction networks from different contexts. These networks can be modeled as graphs, called complex networks. They have a community structure, i.e. groups of nodes closely related to each other and less connected with the rest of the graph. An other phenomenon studied in complex networks in many contexts is diffusion. The spread of a disease is an example of diffusion. These phenomena are dynamic and depend on an important parameter, which is often little studied: the time scale in which they are observed. According to the chosen scale, the graph dynamics can vary significantly. In this thesis, we propose to study dynamic processes using a suitable time scale. We consider a notion of relative time which we call intrinsic time, opposed to "traditional" time, which we call extrinsic time. We first study diffusion phenomena using intrinsic time, and we compare our results with an extrinsic time scale. This allows us to highlight the fact that the same phenomenon observed at two different time scales can have a very different behavior. We then analyze the relevance of the use of intrinsic time scale for detecting dynamic communities. Comparing communities obtained according extrinsic and intrinsic scales shows that the intrinsic time scale allows a more significant detection than extrinsic time scale
Rivoire, Olivier. "Phases vitreuses, optimisation et grandes déviations." Phd thesis, Université Paris Sud - Paris XI, 2005. http://tel.archives-ouvertes.fr/tel-00009956.
Full text