Tesi: "Bipartite stochastic block model"

1

Sigalla, Suzanne. "Contributions to structured high-dimensional inference". Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAG013.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Dans cette thèse, nous considérons les trois problèmes suivants : le problème de clustering dans le Bipartite Stochastic Block Model, le problème de classification de documents dans le cadre des topic models, et le problème de benign overfitting dans le cadre de régression non paramétrique. Tout d'abord, nous considérons le problème de clustering dans le Bipartite Stochastic Block Model (BSBM). Le BSBM est une généralisation non symétrique du Stochastic Block Model, avec deux ensembles de sommets. Nous introduisons un algorithme appelé le Hollowed Lloyd's algorithm, qui permet de classer les sommets du plus petit ensemble avec grande probabilité. Nous fournissons des garanties statistiques sur cet algorithme, qui est rapide et simple à implémenter. Nous établissons une condition suffisante pour le clustering dans le BSBM. Nos résultats améliorent les travaux précédents sur le BSBM, en particulier dans le cadre de grande dimension. Deuxièmement, nous étudions le problème de la classification de documents dans le cadre des topic models. Les topic models permettent d'exploiter des structures sous-jacentes dans un grand corpus de documents et ainsi de réduire la dimension du problème considéré. Chaque topic est vu comme une distribution de probabilité sur le dictionnaire de mots du corpus, et chaque document est vu comme un mélange de topics. Nous introduisons un algorithme appelé Successive Projection Overlapping Clustering (SPOC), inspiré du Successive Projection Algorithm pour le problème de Nonnegative Matrix Factorization. L'algorithme SPOC est rapide et simple à implémenter. Nous fournissons des garanties statistiques sur le résultat de l'algorithme SPOC. En particulier, nous fournissons des bornes minimax inférieures et supérieures sur son risque d'estimation pour les normes de Frobenius et l1, bornes correspondant à de faibles facteurs près. Notre procédure de clustering est adaptative en le nombre de topics. Enfin, le troisième problème étudié lors de cette thèse porte sur la régression non paramétrique. Nous considérons des estimateurs par polynômes locaux avec des noyaux singuliers. Nous prouvons que ces estimateurs sont minimax optimaux, adaptatifs en la régularité et interpolants avec une probabilité élevée. Cette propriété est appelée benign overfitting
In this thesis, we consider the three following problems: clustering in Bipartite Stochastic Block Model, estimation of topic-document matrix in topic model, and benign overfitting in nonparametric regression. First, we consider the graph clustering problem in the Bipartite Stochastic Block Model (BSBM). The BSBM is a non-symmetric generalization of the Stochastic Block Model, with two sets of vertices. We provide an algorithm called the Hollowed Lloyd's algorithm, which allows one to classify vertices of the smallest set with high probability. We provide statistical guarantees on this algorithm, which is computationnally fast and simple to implement. We establish a sufficient condition for clustering in BSBM. Our results improve on previous works on BSBM, in particular in the high-dimensional regime. Second, we study the problem of assigning topics to documents using topic models. Topic models allow one to discover hidden structures in a large corpus of documents through dimension reduction. Each topic is considered as a probability distribution on the dictionary of words, and each document is considered as a mixture of topics. We introduce an algotihm called the Successive Projection Overlapping Clustering (SPOC) algorithm, inspired by the Successive Projection Algorithm for Non-negative Matrix Factorization. The SPOC algorithm is computationnally fast and simple to implement. We provide statistical guarantees on the outcome of the algorithm. In particular, we provide near matching minimax upper and lower bounds on its estimation risk under the Frobenius and the l1-norm. Our clustering procedure is adaptive in the number of topics. Finally, the third problem we study is a nonparametric regression problem. We consider local polynomial estimators with singular kernel, which we prove to be minimax optimal, adaptive to unknown smoothness, and interpolating with high probability. This property is called benign overfitting

2

Ludkin, Matthew Robert. "The autoregressive stochastic block model with changes in structure". Thesis, Lancaster University, 2017. http://eprints.lancs.ac.uk/125642/.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Network science has been a growing subject for the last three decades, with sta- tistical analysis of networks seing an explosion since the advent of online social networks. An important model within network analysis is the stochastic block model, which aims to partition the set of nodes of a network into groups which behave in a similar way. This thesis proposes Bayesian inference methods for problems related to the stochastic block model for network data. The presented research is formed of three parts. Firstly, two Markov chain Monte Carlo samplers are proposed to sample from the posterior distribution of the number of blocks, block memberships and edge-state parameters in the stochastic block model. These allow for non-binary and non-conjugate edge models, something not considered in the literature. Secondly, a dynamic extension to the stochastic block model is presented which includes autoregressive terms. This novel approach to dynamic network models allows the present state of an edge to influence future states, and is therefore named the autoregresssive stochastic block model. Furthermore, an algorithm to perform inference on changes in block membership is given. This problem has gained some attention in the literature, but not with autoregressive features to the edge-state distribution as presented in this thesis. Thirdly, an online procedure to detect changes in block membership in the au- toregresssive stochastic block model is presented. This allows networks to be monitored through time, drastically reducing the data storage requirements. On top of this, the network parameters can be estimated together with the block memberships. Finally, conclusions are drawn from the above contributions in the context of the network analysis literature and future directions for research are identified.

3

Paltrinieri, Federico. "Modeling temporal networks with dynamic stochastic block models". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/18805/.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Osservando il recente interesse per le reti dinamiche temporali e l'ampio numero di campi di applicazione, questa tesi ha due principali propositi: primo, di analizzare alcuni modelli teorici di reti temporali, specialmente lo stochastic blockmodel dinamico, al fine di descrivere la dinamica di sistemi reali e fare previsioni. Il secondo proposito della tesi è quello di creare due nuovi modelli teorici, basati sulla teoria dei processi autoregressivi, dai quali inferire nuovi parametri dalle reti temporali, come la matrice di evoluzione di stato e una migliore stima della varianza del rumore del processo di evoluzione temporale. Infine, tutti i modelli sono testati su un data set interbancario: questi rivelano la presenza di un evento atteso che divide la rete temporale in due periodi distinti con differenti configurazioni e parametri.

4

Vallès, Català Toni. "Network inference based on stochastic block models: model extensions, inference approaches and applications". Doctoral thesis, Universitat Rovira i Virgili, 2016. http://hdl.handle.net/10803/399539.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

L'estudi de xarxes ha contribuït a la comprensió de sistemes complexos en una àmplia gamma de camps com la biologia molecular i cel·lular, l'anatomia, la neurociència, l'ecologia, l'economia i la sociologia. No obstant, el coneixement disponible sobre molts sistemes reals encara és limitat, per aquesta raó el poder predictiu de la ciència en xarxes s'ha de millorar per disminuir la bretxa entre coneixement i informació. Per abordar aquest tema fem servir la família de 'Stochastic Block Models' (SBM), una família de models generatius que està guanyant gran interès recentment a causa de la seva adaptabilitat a qualsevol tipus de xarxa. L'objectiu d'aquesta tesi és el desenvolupament de noves metodologies d'inferència basades en SBM que perfeccionaran la nostra comprensió de les xarxes complexes. En primer lloc, investiguem en quina mesura fer un mostreg sobre models pot millorar significativament la capacitat de predicció que considerar un únic conjunt òptim de paràmetres. Un cop sabem quin model és capaç de descriure millor una xarxa determinada, apliquem aquest mètode en un cas particular d'una xarxa real: una xarxa basada en les interaccions/sutures entre els ossos del crani en nounats. Concretament, descobrim que les sutures tancades a causa d'una malaltia patològica en el nounat humà son menys probables, des d'un punt de vista morfològic, que les sutures tancades sota un desenvolupament normal. Recents investigacions en xarxes multicapa conclou que el comportament de les xarxes d'una sola capa són diferents de les de múltiples capes; d'altra banda, les xarxes del món real se'ns presenten com xarxes d'una sola capa.
El estudio de las redes del mundo real han empujado hacia la comprensión de sistemas complejos en una amplia gama de campos como la biología molecular y celular, la anatomía, la neurociencia, la ecología, la economía y la sociología . Sin embargo, el conocimiento disponible de muchos sistemas reales aún es limitado, por esta razón el poder predictivo de la ciencia en redes se debe mejorar para disminuir la brecha entre conocimiento y información. Para abordar este tema usamos la familia de 'Stochastic Block Modelos' (SBM), una familia de modelos generativos que está ganando gran interés recientemente debido a su adaptabilidad a cualquier tipo de red. El objetivo de esta tesis es el desarrollo de nuevas metodologías de inferencia basadas en SBM que perfeccionarán nuestra comprensión de las redes complejas. En primer lugar, investigamos en qué medida hacer un muestreo sobre modelos puede mejorar significativamente la capacidad de predicción a considerar un único conjunto óptimo de parámetros. Seguidamente, aplicamos el método mas predictivo en una red real particular: una red basada en las interacciones/suturas entre los huesos del cráneo humano en recién nacidos. Concretamente, descubrimos que las suturas cerradas a causa de una enfermedad patológica en recién nacidos son menos probables, desde un punto de vista morfológico, que las suturas cerradas bajo un desarrollo normal. Concretamente, descubrimos que las suturas cerradas a causa de una enfermedad patológica en recién nacidos son menos probables, desde un punto de vista morfológico, que las suturas cerradas bajo un desarrollo normal. Recientes investigaciones en las redes multicapa concluye que el comportamiento de las redes en una sola capa son diferentes a las de múltiples capas; por otra parte, las redes del mundo real se nos presentan como redes con una sola capa. La parte final de la tesis está dedicada a diseñar un nuevo enfoque en el que dos SBM separados describen simultáneamente una red dada que consta de una sola capa, observamos que esta metodología predice mejor que la metodología de un SBM solo.
The study of real-world networks have pushed towards to the understanding of complex systems in a wide range of fields as molecular and cell biology, anatomy, neuroscience, ecology, economics and sociology. However, the available knowledge from most systems is still limited, hence network science predictive power should be enhanced to diminish the gap between knowledge and information. To address this topic we handle with the family of Stochastic Block Models (SBMs), a family of generative models that are gaining high interest recently due to its adaptability to any kind of network structure. The goal of this thesis is to develop novel SBM based inference approaches that will improve our understanding of complex networks. First, we investigate to what extent sampling over models significatively improves the predictive power than considering an optimal set of parameters alone. Once we know which model is capable to describe better a given network, we apply such method in a particular real world network case: a network based on the interactions/sutures between bones in newborn skulls. Notably, we discovered that sutures fused due to a pathological disease in human newborn were less likely, from a morphological point of view, that those sutures that fused under a normal development. Recent research on multilayer networks has concluded that the behavior of single-layered networks are different from those of multilayer ones; notwhithstanding, real world networks are presented to us as single-layered networks. The last part of the thesis is devoted to design a novel approach where two separate SBMs simultaneously describe a given single-layered network. We importantly find that it predicts better missing/spurious links that the single SBM approach.

5

Corneli, Marco. "Dynamic stochastic block models, clustering and segmentation in dynamic graphs". Thesis, Paris 1, 2017. http://www.theses.fr/2017PA01E012/document.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Cette thèse porte sur l’analyse de graphes dynamiques, définis en temps discret ou continu. Nous introduisons une nouvelle extension dynamique du modèle a blocs stochastiques (SBM), appelée dSBM, qui utilise des processus de Poisson non homogènes pour modéliser les interactions parmi les paires de nœuds d’un graphe dynamique. Les fonctions d’intensité des processus ne dépendent que des classes des nœuds comme dans SBM. De plus, ces fonctions d’intensité ont des propriétés de régularité sur des intervalles temporels qui sont à estimer, et à l’intérieur desquels les processus de Poisson redeviennent homogènes. Un récent algorithme d’estimation pour SBM, qui repose sur la maximisation d’un critère exact (ICL exacte) est ici adopté pour estimer les paramètres de dSBM et sélectionner simultanément le modèle optimal. Ensuite, un algorithme exact pour la détection de rupture dans les séries temporelles, la méthode «pruned exact linear time» (PELT), est étendu pour faire de la détection de rupture dans des données de graphe dynamique selon le modèle dSBM. Enfin, le modèle dSBM est étendu ultérieurement pour faire de l’analyse de réseau textuel dynamique. Les réseaux sociaux sont un exemple de réseaux textuels: les acteurs s’échangent des documents (posts, tweets, etc.) dont le contenu textuel peut être utilisé pour faire de la classification et détecter la structure temporelle du graphe dynamique. Le modèle que nous introduisons est appelé «dynamic stochastic topic block model» (dSTBM)
This thesis focuses on the statistical analysis of dynamic graphs, both defined in discrete or continuous time. We introduce a new extension of the stochastic block model (SBM) for dynamic graphs. The proposed approach, called dSBM, adopts non homogeneous Poisson processes to model the interaction times between pairs of nodes in dynamic graphs, either in discrete or continuous time. The intensity functions of the processes only depend on the node clusters, in a block modelling perspective. Moreover, all the intensity functions share some regularity properties on hidden time intervals that need to be estimated. A recent estimation algorithm for SBM, based on the greedy maximization of an exact criterion (exact ICL) is adopted for inference and model selection in dSBM. Moreover, an exact algorithm for change point detection in time series, the "pruned exact linear time" (PELT) method is extended to deal with dynamic graph data modelled via dSBM. The approach we propose can be used for change point analysis in graph data. Finally, a further extension of dSBM is developed to analyse dynamic net- works with textual edges (like social networks, for instance). In this context, the graph edges are associated with documents exchanged between the corresponding vertices. The textual content of the documents can provide additional information about the dynamic graph topological structure. The new model we propose is called "dynamic stochastic topic block model" (dSTBM).Graphs are mathematical structures very suitable to model interactions between objects or actors of interest. Several real networks such as communication networks, financial transaction networks, mobile telephone networks and social networks (Facebook, Linkedin, etc.) can be modelled via graphs. When observing a network, the time variable comes into play in two different ways: we can study the time dates at which the interactions occur and/or the interaction time spans. This thesis only focuses on the first time dimension and each interaction is assumed to be instantaneous, for simplicity. Hence, the network evolution is given by the interaction time dates only. In this framework, graphs can be used in two different ways to model networks. Discrete time […] Continuous time […]. In this thesis both these perspectives are adopted, alternatively. We consider new unsupervised methods to cluster the vertices of a graph into groups of homogeneous connection profiles. In this manuscript, the node groups are assumed to be time invariant to avoid possible identifiability issues. Moreover, the approaches that we propose aim to detect structural changes in the way the node clusters interact with each other. The building block of this thesis is the stochastic block model (SBM), a probabilistic approach initially used in social sciences. The standard SBM assumes that the nodes of a graph belong to hidden (disjoint) clusters and that the probability of observing an edge between two nodes only depends on their clusters. Since no further assumption is made on the connection probabilities, SBM is a very flexible model able to detect different network topologies (hubs, stars, communities, etc.)

6

Yenerdag, Erdem <1988&gt. "Contagion Analysis in European Financial Markets Through the Lens of Weighted Stochastic Block Model: Systematically Important Communities of Financial Institutions". Master's Degree Thesis, Università Ca' Foscari Venezia, 2016. http://hdl.handle.net/10579/8816.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This study provides a new perspective to analyze systemic risk and contagion channels of financial markets by proposing Weighted Stochastic Block Model (WSBM) as a generative model for the financial networks. WSBM allows regulators to analyze systemic risk and contagion channels of financial markets by the topological features of WSBM communities. In the empirical application of the WSBM, it is found that the number of communities tends to increase during the financial crisis which can be analyzed as a new early warning indicator of systemic risk. In addition, a new ranking method, based on the new notion of systematically important communities of financial institutions, is provided to assess the systemically important financial institutions.

7

Albertyn, Martin. "Generic simulation modelling of stochastic continuous systems". Thesis, Pretoria : [s.n.], 2004. http://upetd.up.ac.za/thesis/available/etd-05242005-112442.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

8

Alkadri, Mohamed Yaser. "Freeway Control Via Ramp Metering: Development of a Basic Building Block for an On-Ramp, Discrete, Stochastic, Mesoscopic, Simulation Model within a Contextual Systems Approach". PDXScholar, 1991. https://pdxscholar.library.pdx.edu/open_access_etds/1308.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

One of the most effective measures of congestion control on freeways has been ramp metering, where vehicle entry to the freeway is regulated by traffic signals (meters). Meters are run with calibrated influx rates to prevent highway saturation. However, recent observations of some metering sites in San Diego, CA indicate that metering, during peak hour demand, is helping freeway flow while sometimes creating considerable traffic back-ups on local streets, transferring congestion problems from the freeway to intersections. Metering problems stem largely from the difficulty of designing an integrated, dynamic metering scheme that responds not only to changing freeway conditions but also to fluctuating demand throughout the ramp network; a scheme whose objective is to maintain adequate freeway throughput as well as minimize disproportionate ramp delays and queue overspills onto surface streets. Simulation modeling is a versatile, convenient, relatively inexpensive and safe systems analysis tool for evaluating alternative strategies to achieve the above objective. The objective of this research was to establish a basic building block for a discrete system simulation model, ONRAMP, based on a stochastic, mesoscopic, queueing approach. ONRAMP is for modeling entrance ramp geometry, vehicular generation, platooning and arrivals, queueing activities, meters and metering rates. The architecture of ONRAMP's molecular unit is designed in a fashion so that it can be, with some model calibration, duplicated for a number of ramps and, if necessary, integrated into some other larger freeway network models. SLAM.II simulation language is used for computer implementation. ONRAMP has been developed and partly validated using data from eight ramps at Interstate-B in San Diego. From a systems perspective, simulation will be short-sided and problem analysis is incomplete unless the other non-technical metering problems are explored and considered. These problems include the impacts of signalizing entrance ramps on the vitality of adjacent intersections, land use and development, "fair" geographic distribution of meters and metering rates throughout the freeway corridor, public acceptance and enforcement, and the role and influence of organizations in charge of decision making in this regard. Therefore, an outline of a contextual systems approach for problem analysis is suggested. Benefits and problems of freeway control via ramp metering, both operational short-term and strategic long-term, are discussed in two dimensions: global (freeway) and local (intersection). The results of a pilot study which includes interviews with field experts and law enforcement officials and a small motorist survey are presented.

9

Tabouy, Timothée. "Impact de l’échantillonnage sur l’inférence de structures dans les réseaux : application aux réseaux d’échanges de graines et à l’écologie". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS289/document.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Dans cette thèse nous nous intéressons à l’étude du modèle à bloc stochastique (SBM) en présence de données manquantes. Nous proposons une classification des données manquantes en deux catégories Missing At Random et Not Missing At Random pour les modèles à variables latentes suivant le modèle décrit par D. Rubin. De plus, nous nous sommes attachés à décrire plusieurs stratégies d’échantillonnages de réseau et leurs lois. L’inférence des modèles de SBM avec données manquantes est faite par l’intermédiaire d’une adaptation de l’algorithme EM : l’EM avec approximation variationnelle. L’identifiabilité de plusieurs des SBM avec données manquantes a pu être démontrée ainsi que la consistance et la normalité asymptotique des estimateurs du maximum de vraisemblance et des estimateurs avec approximation variationnelle dans le cas où chaque dyade (paire de nœuds) est échantillonnée indépendamment et avec même probabilité. Nous nous sommes aussi intéressés aux modèles de SBM avec covariables, à leurs inférence en présence de données manquantes et comment procéder quand les covariables ne sont pas disponibles pour conduire l’inférence. Finalement, toutes nos méthodes ont été implémenté dans un package R disponible sur le CRAN. Une documentation complète sur l’utilisation de ce package a été écrite en complément
In this thesis we are interested in studying the stochastic block model (SBM) in the presence of missing data. We propose a classification of missing data into two categories Missing At Random and Not Missing At Random for latent variable models according to the model described by D. Rubin. In addition, we have focused on describing several network sampling strategies and their distributions. The inference of SBMs with missing data is made through an adaptation of the EM algorithm : the EM with variational approximation. The identifiability of several of the SBM models with missing data has been demonstrated as well as the consistency and asymptotic normality of the maximum likelihood estimators and variational approximation estimators in the case where each dyad (pair of nodes) is sampled independently and with equal probability. We also looked at SBMs with covariates, their inference in the presence of missing data and how to proceed when covariates are not available to conduct the inference. Finally, all our methods were implemented in an R package available on the CRAN. A complete documentation on the use of this package has been written in addition

10

Arastuie, Makan. "Generative Models of Link Formation and Community Detection in Continuous-Time Dynamic Networks". University of Toledo / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1596718772873086.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

11

Junuthula, Ruthwik Reddy. "Modeling, Evaluation and Analysis of Dynamic Networks for Social Network Analysis". University of Toledo / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1544819215833249.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

12

Zreik, Rawya. "Analyse statistique des réseaux et applications aux sciences humaines". Thesis, Paris 1, 2016. http://www.theses.fr/2016PA01E061/document.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Depuis les travaux précurseurs de Moreno (1934), l’analyse des réseaux est devenue une discipline forte, qui ne se limite plus à la sociologie et qui est à présent appliquée à des domaines très variés tels que la biologie, la géographie ou l’histoire. L’intérêt croissant pour l’analyse des réseaux s’explique d’une part par la forte présence de ce type de données dans le monde numérique d’aujourd’hui et, d’autre part, par les progrès récents dans la modélisation et le traitement de ces données. En effet, informaticiens et statisticiens ont porté leurs efforts depuis plus d’une dizaine d’années sur ces données de type réseau en proposant des nombreuses techniques permettant leur analyse. Parmi ces techniques on note les méthodes de clustering qui permettent en particulier de découvrir une structure en groupes cachés dans le réseau. De nombreux facteurs peuvent exercer une influence sur la structure d’un réseau ou rendre les analyses plus faciles à comprendre. Parmi ceux-ci, on trouve deux facteurs importants: le facteur du temps, et le contexte du réseau. Le premier implique l’évolution des connexions entre les nœuds au cours du temps. Le contexte du réseau peut alors être caractérisé par différents types d’informations, par exemple des messages texte (courrier électronique, tweets, Facebook, messages, etc.) échangés entre des nœuds, des informations catégoriques sur les nœuds (âge, sexe, passe-temps, Les fréquences d’interaction (par exemple, le nombre de courriels envoyés ou les commentaires affichés), et ainsi de suite. La prise en considération de ces facteurs nous permet de capturer de plus en plus d’informations complexes et cachées à partir des données. L’objectif de ma thèse été de définir des nouveaux modèles de graphes aléatoires qui prennent en compte les deux facteurs mentionnés ci-dessus, afin de développer l’analyse de la structure du réseau et permettre l’extraction de l’information cachée à partir des données. Ces modèles visent à regrouper les sommets d’un réseau en fonction de leurs profils de connexion et structures de réseau, qui sont statiques ou évoluant dynamiquement au cours du temps. Le point de départ de ces travaux est le modèle de bloc stochastique (SBM). Il s’agit d’un modèle de mélange pour les graphiques qui ont été initialement développés en sciences sociales. Il suppose que les sommets d’un réseau sont répartis sur différentes classes, de sorte que la probabilité d’une arête entre deux sommets ne dépend que des classes auxquelles ils appartiennent
Over the last two decades, network structure analysis has experienced rapid growth with its construction and its intervention in many fields, such as: communication networks, financial transaction networks, gene regulatory networks, disease transmission networks, mobile telephone networks. Social networks are now commonly used to represent the interactions between groups of people; for instance, ourselves, our professional colleagues, our friends and family, are often part of online networks, such as Facebook, Twitter, email. In a network, many factors can exert influence or make analyses easier to understand. Among these, we find two important ones: the time factor, and the network context. The former involves the evolution of connections between nodes over time. The network context can then be characterized by different types of information such as text messages (email, tweets, Facebook, posts, etc.) exchanged between nodes, categorical information on the nodes (age, gender, hobbies, status, etc.), interaction frequencies (e.g., number of emails sent or comments posted), and so on. Taking into consideration these factors can lead to the capture of increasingly complex and hidden information from the data. The aim of this thesis is to define new models for graphs which take into consideration the two factors mentioned above, in order to develop the analysis of network structure and allow extraction of the hidden information from the data. These models aim at clustering the vertices of a network depending on their connection profiles and network structures, which are either static or dynamically evolving. The starting point of this work is the stochastic block model, or SBM. This is a mixture model for graphs which was originally developed in social sciences. It assumes that the vertices of a network are spread over different classes, so that the probability of an edge between two vertices only depends on the classes they belong to

13

Cerqueira, Andressa. "Statistical inference on random graphs and networks". Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-04042018-094802/.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In this thesis we study two probabilistic models defined on graphs: the Stochastic Block model and the Exponential Random Graph. Therefore, this thesis is divided in two parts. In the first part, we introduce the Krichevsky-Trofimov estimator for the number of communities in the Stochastic Block Model and prove its eventual almost sure convergence to the underlying number of communities, without assuming a known upper bound on that quantity. In the second part of this thesis we address the perfect simulation problem for the Exponential random graph model. We propose an algorithm based on the Coupling From The Past algorithm using a Glauber dynamics. This algorithm is efficient in the case of monotone models. We prove that this is the case for a subset of the parametric space. We also propose an algorithm based on the Backward and Forward algorithm that can be applied for monotone and non monotone models. We prove the existence of an upper bound for the expected running time of both algorithms.
Nessa tese estudamos dois modelos probabilísticos definidos em grafos: o modelo estocástico por blocos e o modelo de grafos exponenciais. Dessa forma, essa tese está dividida em duas partes. Na primeira parte nós propomos um estimador penalizado baseado na mistura de Krichevsky-Trofimov para o número de comunidades do modelo estocástico por blocos e provamos sua convergência quase certa sem considerar um limitante conhecido para o número de comunidades. Na segunda parte dessa tese nós abordamos o problema de simulação perfeita para o modelo de grafos aleatórios Exponenciais. Nós propomos um algoritmo de simulação perfeita baseado no algoritmo Coupling From the Past usando a dinâmica de Glauber. Esse algoritmo é eficiente apenas no caso em que o modelo é monotóno e nós provamos que esse é o caso para um subconjunto do espaço paramétrico. Nós também propomos um algoritmo de simulação perfeita baseado no algoritmo Backward and Forward que pode ser aplicado à modelos monótonos e não monótonos. Nós provamos a existência de um limitante superior para o número esperado de passos de ambos os algoritmos.

14

(10725294), Nithish Kumar Kumar. "Stochastic Block Model Dynamics". Thesis, 2021.

Cerca il testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The past few years have seen an increasing focus on fairness and the long-term impact of algorithmic decision making in the context of Machine learning, Artificial Intelligence and other disciplines. In this thesis, we model hiring processes in enterprises and organizations using dynamic mechanism design. Using a stochastic block model to simulate the workings of a hiring process, we study fairness and long-term evolution in the system.

We first present multiple results on a deterministic variant of our model including convergence and an accurate approximate solution describing the state of the deterministic variant after any time period has elapsed. Using the differential equation method, it can be shown that this deterministic variant is in turn an accurate approximation of the evolution of our stochastic block model with high probability.

Finally, we derive upper and lower bounds on the expected state at each time step, and further show that in the limiting case of the long-term, these upper and lower bounds themselves converge to the state evolution of the deterministic system. These results offer conclusions on the long-term behavior of our model, thereby allowing reasoning on how fairness in organizations could be achieved. We conclude that without sufficient, systematic incentives, under-represented groups will wane out from organizations over time.

15

Santhosh, D. "Stochastic Simulation Of Daily Rainfall Data Using Matched Block Bootstrap". Thesis, 2008. https://etd.iisc.ac.in/handle/2005/681.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Characterizing the uncertainty in rainfall using stochastic models has been a challenging area of research in the field of operational hydrology for about half a century. Simulated sequences drawn from such models find use in a variety of hydrological applications. Traditionally, parametric models are used for simulating rainfall. But the parametric models are not parsimonious and have uncertainties associated with identification of model form, normalizing transformation, and parameter estimation. None of the models in vogue have gained universal acceptability among practising engineers. This may either be due to lack of confidence in the existing models, or the inability to adopt models proposed in literature because of their complexity or both. In the present study, a new nonparametric Matched Block Bootstrap (MABB) model is proposed for stochastic simulation of rainfall at daily time scale. It is based on conditional matching of blocks formed from the historical rainfall data using a set of predictors (conditioning variables) proposed for matching the blocks. The efficiency of the developed model is demonstrated through application to rainfall data from India, Australia, and USA. The performance of MABB is compared with two non-parametric rainfall simulation models, k-NN and ROG-RAG, for a site in Melbourne, Australia. The results showed that MABB model is a feasible alternative to ROG-RAG and k-NN models for simulating daily rainfall sequences for hydrologic applications. Further it is found that MABB and ROG-RAG models outperform k-NN model. The proposed MABB model preserved the summary statistics of rainfall and fraction of wet days at daily, monthly, seasonal and annual scales. It could also provide reasonable performance in simulating spell statistics. The MABB is parsimonious and requires less computational effort than ROG-RAG model. It reproduces probability density function (marginal distribution) fairly well due to its data driven nature. Results obtained for sites in India and U.S.A. show that the model is robust and promising.

16

Santhosh, D. "Stochastic Simulation Of Daily Rainfall Data Using Matched Block Bootstrap". Thesis, 2008. http://hdl.handle.net/2005/681.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Characterizing the uncertainty in rainfall using stochastic models has been a challenging area of research in the field of operational hydrology for about half a century. Simulated sequences drawn from such models find use in a variety of hydrological applications. Traditionally, parametric models are used for simulating rainfall. But the parametric models are not parsimonious and have uncertainties associated with identification of model form, normalizing transformation, and parameter estimation. None of the models in vogue have gained universal acceptability among practising engineers. This may either be due to lack of confidence in the existing models, or the inability to adopt models proposed in literature because of their complexity or both. In the present study, a new nonparametric Matched Block Bootstrap (MABB) model is proposed for stochastic simulation of rainfall at daily time scale. It is based on conditional matching of blocks formed from the historical rainfall data using a set of predictors (conditioning variables) proposed for matching the blocks. The efficiency of the developed model is demonstrated through application to rainfall data from India, Australia, and USA. The performance of MABB is compared with two non-parametric rainfall simulation models, k-NN and ROG-RAG, for a site in Melbourne, Australia. The results showed that MABB model is a feasible alternative to ROG-RAG and k-NN models for simulating daily rainfall sequences for hydrologic applications. Further it is found that MABB and ROG-RAG models outperform k-NN model. The proposed MABB model preserved the summary statistics of rainfall and fraction of wet days at daily, monthly, seasonal and annual scales. It could also provide reasonable performance in simulating spell statistics. The MABB is parsimonious and requires less computational effort than ROG-RAG model. It reproduces probability density function (marginal distribution) fairly well due to its data driven nature. Results obtained for sites in India and U.S.A. show that the model is robust and promising.

17

Lin, Christy. "Unsupervised random walk node embeddings for network block structure representation". Thesis, 2021. https://hdl.handle.net/2144/43083.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

There has been an explosion of network data in the physical, chemical, biological, computational, and social sciences in the last few decades. Node embeddings, i.e., Euclidean-space representations of nodes in a network, make it possible to apply to network data, tools and algorithms from multivariate statistics and machine learning that were developed for Euclidean-space data. Random walk node embeddings are a class of recently developed node embedding techniques where the vector representations are learned by optimizing objective functions involving skip-bigram statistics computed from random walks on the network. They have been applied to many supervised learning problems such as link prediction and node classification and have demonstrated state-of-the-art performance. Yet, their properties remain poorly understood. This dissertation studies random walk based node embeddings in an unsupervised setting within the context of capturing hidden block structure in the network, i.e., learning node representations that reflect their patterns of adjacencies to other nodes. This doctoral research (i) Develops VEC, a random walk based unsupervised node embedding algorithm, and a series of relaxations, and experimentally validates their performance for the community detection problem under the Stochastic Block Model (SBM). (ii) Characterizes the ergodic limits of the embedding objectives to create non-randomized versions. (iii) Analyzes the embeddings for expected SBM networks and establishes certain concentration properties of the limiting ergodic objective in the large network asymptotic regime. Comprehensive experimental results on real world and SBM random networks are presented to illustrate and compare the distributional and block-structure properties of node embeddings generated by VEC and related algorithms. As a step towards theoretical understanding, it is proved that for the variants of VEC with ergodic limits and convex relaxations, the embedding Grammian of the expected network of a two-community SBM has rank at most 2. Further experiments reveal that these extensions yield embeddings whose distribution is Gaussian-like, centered at the node embeddings of the expected network within each community, and concentrate in the linear degree-scaling regime as the number of nodes increases.
2023-09-24T00:00:00Z

18

Sampietro, Samuele. "Timed Failure Logic Analysis in a Model-Driven Engineering approach". Doctoral thesis, 2021. http://hdl.handle.net/2158/1238685.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

A complex System of Systems, integrating several hardware and software components in the holistic perspective of providing an emergent behaviour and operating within business-critical contexts, aims at affording contrasting requirements of reliability and complexity in delivered functions and quality of services by supporting system evolution and adaptation over time. This dissertation contributes to the area of Model-Driven Engineering (MDE), proposing a model-driven approach supporting timed failure logic analysis of complex Cyber-Physical Systems (CPS) in business-critical scenarios. The research defines a meta-model joining structural information about system architectures with their failure logic, decoupling representations of communication interfaces from those of failure propagation. The meta-model also supports runtime evolution (which can be very fast in the case of complex CPS) of concrete systems by enabling the configuration of product lines, capable of representing multiple variation points of a component, supporting continuous adaptation of offered products and services to business or customer needs. The meta-model enables a round-trip engineering process through the definition of a set of transformation rules, supporting the automated and correct-by-construction initialisation of meta-model instances starting from SysML Block Definition Diagrams for system specification and stochastic Fault Trees for timed failure logic, thus activating co-evolution mechanisms propagating external manual modifications, applied on meta-model instances, directly to the adopted structural and reliability artefacts. At the same time, a set of transformation rules has been defined so as to enable the automated generation of Stochastic Time Petri Nets (STPN) from meta-model instances, thus supporting quantitative evaluation of the imed failure logic. The MDE approach is demonstrated on the case study of a CPS operating in a Smart City environment, evaluating at design time different configurations of the system with respect to the reliability of its cyber-side. The research also addresses the design and the prototypical implementation of a tool offered both as-a-service and as a Java API.

19

Qin, Juan. "A high-resolution hierarchical model for space-time rainfall". Thesis, 2011. http://hdl.handle.net/1959.13/808076.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Research Doctorate - Doctor of Philosophy (PhD)
The hydrologic response of urban catchments is sensitive to small scale space-time rainfall variations. A stochastic space-time rainfall model used for design purposes must reproduce important statistics at these small scales. However, current rainfall models make simplifying assumptions about the temporal characteristics of rainfields and thus cannot be expected to reproduce important statistics over various space and time scales. In this study, an extensive investigation of radar rainfall data for the Sydney region motivated the development of a new phenomenological hierarchical stochastic model to robustly simulate rainfall fields consistent with 10-minute 1-km2 pixel radar images. The hierarchical framework consists of three levels. The development of the first two levels which simulate the evolution of rainfall fields for a single storm is the focus of this thesis. The third level, which is designed for simulation of storm sequences, is left for future research. The first level simulates a latent Gaussian random field conditioned on the previous time step, , which is transformed to a rain field using a power transformation. A Toeplitz block circulant technique is used to achieve fast and accurate simulations of large Gaussian random fields (with lattice of 256 by 256), and is shown to be hugely more efficient than the traditional Cholesky decomposition method. In the second level, first-order autoregressive (AR(1)) models are used to describe the within-storm variations of the level-one parameters that control the evolution of the rain fields. Calibration is performed using a generalized method-of-moments approach. The parametric bootstrap validation technique was used to evaluate the performance of the first two levels of the model by comparing the characteristics of interest for four observed storm events (typical of frontal and convective storms experienced in Sydney, Australia) and synthetic storms. It is found that this two-level rainfall model produces realistic sequences of rain images which capture the physical hierarchical structure of clusters, patchiness of rain fields and the persistence exhibited during storm development. A variety of important statistics were adequately reproduced at both 10-min and 1-hr time scales over space scales ranging from 1 km up to 32 km. Finally, application of this model to short-term rainfall forecasting is presented.

Tesi sul tema "Bipartite stochastic block model"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili