Log in

Relevant bibliographies by topics / Statistical graph analysis / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Statistical graph analysis.

Dissertations / Theses on the topic 'Statistical graph analysis'

Author: Grafiati

Published: 13 February 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Statistical graph analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Fairbanks, James Paul. "Graph analysis combining numerical, statistical, and streaming techniques." Diss., Georgia Institute of Technology, 2016. http://hdl.handle.net/1853/54972.

Full text

Abstract:

Graph analysis uses graph data collected on a physical, biological, or social phenomena to shed light on the underlying dynamics and behavior of the agents in that system. Many fields contribute to this topic including graph theory, algorithms, statistics, machine learning, and linear algebra. This dissertation advances a novel framework for dynamic graph analysis that combines numerical, statistical, and streaming algorithms to provide deep understanding into evolving networks. For example, one can be interested in the changing influence structure over time. These disparate techniques each contribute a fragment to understanding the graph; however, their combination allows us to understand dynamic behavior and graph structure. Spectral partitioning methods rely on eigenvectors for solving data analysis problems such as clustering. Eigenvectors of large sparse systems must be approximated with iterative methods. This dissertation analyzes how data analysis accuracy depends on the numerical accuracy of the eigensolver. This leads to new bounds on the residual tolerance necessary to guarantee correct partitioning. We present a novel stopping criterion for spectral partitioning guaranteed to satisfy the Cheeger inequality along with an empirical study of the performance on real world networks such as web, social, and e-commerce networks. This work bridges the gap between numerical analysis and computational data analysis.

APA, Harvard, Vancouver, ISO, and other styles

2

Soriani, Nicola. "Topics in Statistical Models for Network Analysis." Doctoral thesis, Università degli studi di Padova, 2012. http://hdl.handle.net/11577/3422100.

Full text

Abstract:

Network Analysis is a set of statistical and mathematical techniques for the study of relational data arising from a system of connected entities. Most of the results for network data have been obtained in the field of Social Network Analysis (SNA), which mainly focuses on the relationships among a set of individual actors and organizations. The thesis considers some topics in statistical models for network data, with focus in particular on models used in SNA. The core of the thesis is represented by Chapters 3, 4 and 5. In Chapter 3, an alternative approach to estimate the Exponential Random Graph Models (ERGMs) is discussed. In Chapter 4, a comparison between ERGMs and Latent Space models in terms of goodness of fit is considered. In Chapter 5, alternative methods to estimate the p2 class of models are proposed.
La Network Analysis è un insieme di tecniche statistiche e matematiche per lo studio di dati relazionali per un sistema di entità interconnesse. Molti dei risultati per i dati di rete provengono dalla Social Network Analysis (SNA), incentrata principalmente sullo studio delle relazioni tra un insieme di individui e organizzazioni. La tesi tratta alcuni argomenti riguardanti la modellazione statistica per dati di rete, con particolare attenzione ai modelli utilizzati in SNA. Il nucleo centrale della tesi è rappresentato dai Capitoli 3, 4 e 5. Nel Capitolo 3, viene proposto un approccio alternativo per la stima dei modelli esponenziali per grafi casuali (Exponential Random Graph Models - ERGMs). Nel capitolo 4, l'approccio di modellazione ERGM e quello a Spazio Latente vengono confrontati in termini di bontà di adattamento. Nel capitolo 5, vengono proposti metodi alternativi per la stima della classe di modelli p2.

APA, Harvard, Vancouver, ISO, and other styles

3

GRASSI, FRANCESCO. "Statistical and Graph-Based Signal Processing: Fundamental Results and Application to Cardiac Electrophysiology." Doctoral thesis, Politecnico di Torino, 2018. http://hdl.handle.net/11583/2710580.

Full text

Abstract:

The goal of cardiac electrophysiology is to obtain information about the mechanism, function, and performance of the electrical activities of the heart, the identification of deviation from normal pattern and the design of treatments. Offering a better insight into cardiac arrhythmias comprehension and management, signal processing can help the physician to enhance the treatment strategies, in particular in case of atrial fibrillation (AF), a very common atrial arrhythmia which is associated to significant morbidities, such as increased risk of mortality, heart failure, and thromboembolic events. Catheter ablation of AF is a therapeutic technique which uses radiofrequency energy to destroy atrial tissue involved in the arrhythmia sustenance, typically aiming at the electrical disconnection of the of the pulmonary veins triggers. However, recurrence rate is still very high, showing that the very complex and heterogeneous nature of AF still represents a challenging problem. Leveraging the tools of non-stationary and statistical signal processing, the first part of our work has a twofold focus: firstly, we compare the performance of two different ablation technologies, based on contact force sensing or remote magnetic controlled, using signal-based criteria as surrogates for lesion assessment. Furthermore, we investigate the role of ablation parameters in lesion formation using the late-gadolinium enhanced magnetic resonance imaging. Secondly, we hypothesized that in human atria the frequency content of the bipolar signal is directly related to the local conduction velocity (CV), a key parameter characterizing the substrate abnormality and influencing atrial arrhythmias. Comparing the degree of spectral compression among signals recorded at different points of the endocardial surface in response to decreasing pacing rate, our experimental data demonstrate a significant correlation between CV and the corresponding spectral centroids. However, complex spatio-temporal propagation pattern characterizing AF spurred the need for new signals acquisition and processing methods. Multi-electrode catheters allow whole-chamber panoramic mapping of electrical activity but produce an amount of data which need to be preprocessed and analyzed to provide clinically relevant support to the physician. Graph signal processing has shown its potential on a variety of applications involving high-dimensional data on irregular domains and complex network. Nevertheless, though state-of-the-art graph-based methods have been successful for many tasks, so far they predominantly ignore the time-dimension of data. To address this shortcoming, in the second part of this dissertation, we put forth a Time-Vertex Signal Processing Framework, as a particular case of the multi-dimensional graph signal processing. Linking together the time-domain signal processing techniques with the tools of GSP, the Time-Vertex Signal Processing facilitates the analysis of graph structured data which also evolve in time. We motivate our framework leveraging the notion of partial differential equations on graphs. We introduce joint operators, such as time-vertex localization and we present a novel approach to significantly improve the accuracy of fast joint filtering. We also illustrate how to build time-vertex dictionaries, providing conditions for efficient invertibility and examples of constructions. The experimental results on a variety of datasets suggest that the proposed tools can bring significant benefits in various signal processing and learning tasks involving time-series on graphs. We close the gap between the two parts illustrating the application of graph and time-vertex signal processing to the challenging case of multi-channels intracardiac signals.

APA, Harvard, Vancouver, ISO, and other styles

4

Meinhardt, Llopis Enric. "Morphological and statistical techniques for the analysis of 3D images." Doctoral thesis, Universitat Pompeu Fabra, 2011. http://hdl.handle.net/10803/22719.

Full text

Abstract:

Aquesta tesi proposa una estructura de dades per emmagatzemar imatges tridimensionals. L'estructura da dades té forma d'arbre i codifica les components connexes dels conjunts de nivell de la imatge. Aquesta estructura és la eina bàsica per moltes aplicacions proposades: operadors morfològics tridimensionals, visualització d'imatges mèdiques, anàlisi d'histogrames de color, seguiment d'objectes en vídeo i detecció de vores. Motivada pel problema de la completació de vores, la tesi conté un estudi de com l'eliminació de soroll mitjançant variació total anisòtropa es pot fer servir per calcular conjunts de Cheeger en mètriques anisòtropes. Aquests conjunts de Cheeger anisòtrops es poden utilitzar per trobar òptims globals d'alguns funcionals per completar vores. També estan relacionats amb certs invariants afins que s'utilitzen en reconeixement d'objectes, i en la tesi s'explicita aquesta relació.
This thesis proposes a tree data structure to encode the connected components of level sets of 3D images. This data structure is applied as a main tool in several proposed applications: 3D morphological operators, medical image visualization, analysis of color histograms, object tracking in videos and edge detection. Motivated by the problem of edge linking, the thesis contains also an study of anisotropic total variation denoising as a tool for computing anisotropic Cheeger sets. These anisotropic Cheeger sets can be used to find global optima of a class of edge linking functionals. They are also related to some affine invariant descriptors which are used in object recognition, and this relationship is laid out explicitly.

APA, Harvard, Vancouver, ISO, and other styles

5

Tavernari, Daniele. "Statistical and network-based methods for the analysis of chromatin accessibility maps in single cells." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12297/.

Full text

Abstract:

In questo lavoro, metodi provenienti dalla Fisica, dalla Statistica e dalla Teoria dei Grafi sono stati impiegati per caratterizzare ed analizzare profili di apertura e accessibilità della cromatina ottenuti con la tecnica ATAC-seq in singole cellule, nella fattispecie linfociti B provenienti da tre pazienti affetti da Leucemia Linfocitica Cronica. Una pipeline bioinformatica è stata sviluppata per processare i dati di sequencing ed ottenere le posizioni accessibili del genoma per ciascuna cellula. La quantità di regioni aperte e la loro distribuzione spaziale lungo il DNA sono state caratterizzate. Infine, l’apertura simultanea nelle stesse singole cellule di regioni regolatrici è stata impiegata come metrica per valutare relazioni funzionali, e in questo modo grafi tra enhancer e promoter sono stati costruiti e le loro proprietà sono state analizzate. La distribuzione spaziale lungo il genoma di regioni aperte consecutive ricapitola proprietà strutturali come gli array di nucleosomi e le strutture a loop della cromatina. Inoltre, i profili di accessibilità delle regioni regolatrici sono significativamente conservati nelle singole cellule. I network tra enhancer e promoter forniscono un modo per caratterizzare la rilevanza di ciascuna regione regolatrice in termini di centralità. Le statistiche sulla connettività tra enhancer e promoter confermano il modello di relazione uno-a-uno come il più frequente, in cui un promoter è regolato dall'enhancer ad esso più vicino. Infine, anche il funzionamento dei superenhancer è stato indagato. In conclusione, ATAC-seq si rivela un'efficace tecnica per indagare l'apertura della cromatina in singole cellule, i cui profili di accessibilità ricapitolano caratteristiche strutturali e funzionali della cromatina. Al fine di indagare i meccanismi della malattia, il panorama di accessibilità dei lifociti tumorali può essere confrontato con quello di cellule sane e cellule trattate con farmaci epigenetici.

APA, Harvard, Vancouver, ISO, and other styles

6

Valba, Olga. "Statistical analysis of networks and biophysical systems of complex architecture." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00919606.

Full text

Abstract:

Complex organization is found in many biological systems. For example, biopolymers could possess very hierarchic structure, which provides their functional peculiarity. Understating such, complex organization allows describing biological phenomena and predicting molecule functions. Besides, we can try to characterize the specific phenomenon by some probabilistic quantities (variances, means, etc), assuming the primary biopolymer structure to be randomly formed according to some statistical distribution. Such a formulation is oriented toward evolutionary problems.Artificially constructed biological network is another common object of statistical physics with rich functional properties. A behavior of cells is a consequence of complex interactions between its numerous components, such as DNA, RNA, proteins and small molecules. Cells use signaling pathways and regulatory mechanisms to coordinate multiple processes, allowing them to respond and to adapt to changing environment. Recent theoretical advances allow us to describe cellular network structure using graph concepts to reveal the principal organizational features shared with numerous non-biological networks.The aim of this thesis is to develop bunch of methods for studying statistical and dynamic objects of complex architecture and, in particular, scale-free structures, which have no characteristic spatial and/or time scale. For such systems, the use of standard mathematical methods, relying on the average behavior of the whole system, is often incorrect or useless, while a detailed many-body description is almost hopeless because of the combinatorial complexity of the problem. Here we focus on two problems.The first part addresses to statistical analysis of random biopolymers. Apart from the evolutionary context, our studies cover more general problems of planar topology appeared in description of various systems, ranging from gauge theory to biophysics. We investigate analytically and numerically a phase transition of a generic planar matching problem, from the regime, where almost all the vertices are paired, to the situation, where a finite fraction of them remains unmatched.The second part of this work focus on statistical properties of networks. We demonstrate the possibility to define co-expression gene clusters within a network context from their specific motif distribution signatures. We also show how a method based on the shortest path function (SPF) can be applied to gene interactions sub-networks of co-expression gene clusters, to efficiently predict novel regulatory transcription factors (TFs). The biological significance of this method by applying it on groups of genes with a shared regulatory locus, found by genetic genomics, is presented. Finally, we discuss formation of stable patters of motifs in networks under selective evolution in context of creation of islands of "superfamilies".

APA, Harvard, Vancouver, ISO, and other styles

7

Kamal, Tariq. "Computational Cost Analysis of Large-Scale Agent-Based Epidemic Simulations." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/82507.

Full text

Abstract:

Agent-based epidemic simulation (ABES) is a powerful and realistic approach for studying the impacts of disease dynamics and complex interventions on the spread of an infection in the population. Among many ABES systems, EpiSimdemics comes closest to the popular agent-based epidemic simulation systems developed by Eubank, Longini, Ferguson, and Parker. EpiSimdemics is a general framework that can model many reaction-diffusion processes besides the Susceptible-Exposed-Infectious-Recovered (SEIR) models. This model allows the study of complex systems as they interact, thus enabling researchers to model and observe the socio-technical trends and forces. Pandemic planning at the world level requires simulation of over 6 billion agents, where each agent has a unique set of demographics, daily activities, and behaviors. Moreover, the stochastic nature of epidemic models, the uncertainty in the initial conditions, and the variability of reactions require the computation of several replicates of a simulation for a meaningful study. Given the hard timelines to respond, running many replicates (15-25) of several configurations (10-100) (of these compute-heavy simulations) can only be possible on high-performance clusters (HPC). These agent-based epidemic simulations are irregular and show poor execution performance on high-performance clusters due to the evolutionary nature of their workload, large irregular communication and load imbalance. For increased utilization of HPC clusters, the simulation needs to be scalable. Many challenges arise when improving the performance of agent-based epidemic simulations on high-performance clusters. Firstly, large-scale graph-structured computation is central to the processing of these simulations, where the star-motif quality nodes (natural graphs) create large computational imbalances and communication hotspots. Secondly, the computation is performed by classes of tasks that are separated by global synchronization. The non-overlapping computations cause idle times, which introduce the load balancing and cost estimation challenges. Thirdly, the computation is overlapped with communication, which is difficult to measure using simple methods, thus making the cost estimation very challenging. Finally, the simulations are iterative and the workload (computation and communication) may change through iterations, as a result introducing load imbalances. This dissertation focuses on developing a cost estimation model and load balancing schemes to increase the runtime efficiency of agent-based epidemic simulations on high-performance clusters. While developing the cost model and load balancing schemes, we perform the static and dynamic load analysis of such simulations. We also statically quantified the computational and communication workloads in EpiSimdemics. We designed, developed and evaluated a cost model for estimating the execution cost of large-scale parallel agent-based epidemic simulations (and more generally for all constrained producer-consumer parallel algorithms). This cost model uses computational imbalances and communication latencies, and enables the cost estimation of those applications where the computation is performed by classes of tasks, separated by synchronization. It enables the performance analysis of parallel applications by computing its execution times on a number of partitions. Our evaluations show that the model is helpful in performance prediction, resource allocation and evaluation of load balancing schemes. As part of load balancing algorithms, we adopted the Metis library for partitioning bipartite graphs. We have also developed lower-overhead custom schemes called Colocation and MetColoc. We performed an evaluation of Metis, Colocation, and MetColoc. Our analysis showed that the MetColoc schemes gives a performance similar to Metis, but with half the partitioning overhead (runtime and memory). On the other hand, the Colocation scheme achieves a similar performance to Metis on a larger number of partitions, but at extremely lower partitioning overhead. Moreover, the memory requirements of Colocation scheme does not increase as we create more partitions. We have also performed the dynamic load analysis of agent-based epidemic simulations. For this, we studied the individual and joint effects of three disease parameter (transmissiblity, infection period and incubation period). We quantified the effects using an analytical equation with separate constants for SIS, SIR and SI disease models. The metric that we have developed in this work is useful for cost estimation of constrained producer-consumer algorithms, however, it has some limitations. The applicability of the metric is application, machine and data-specific. In the future, we plan to extend the metric to increase its applicability to a larger set of machine architectures, applications, and datasets.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

8

Jiang, Shan. "Statistical Modeling of Multi-Dimensional Knowledge Diffusion Networks: An ERGM-Based Framework." Diss., The University of Arizona, 2015. http://hdl.handle.net/10150/555946.

Full text

Abstract:

Knowledge diffusion networks consist of individuals who exchange knowledge and knowledge flows connecting the individuals. By studying knowledge diffusion in a network perspective, it helps us understand how the connections between individuals affect the knowledge diffusion processes. Existing research on knowledge diffusion networks mostly adopts a uni-dimensional perspective, where all the individuals in the networks are assumed to be of the same type. It also assumes that there is only one type of knowledge flow in the network. This dissertation proposes a multi-dimensional perspective of knowledge diffusion networks and examines the patterns of knowledge diffusion with Exponential Random Graph Model (ERGM) based approaches. The objective of this dissertation is to propose a framework that effectively addresses the multi-dimensionality of knowledge diffusion networks, to enable researchers and practitioners to conceptualize the multi-dimensional knowledge diffusion networks in various domains, and to provide implications on how to stimulate and control the knowledge diffusion process. The dissertation consists of three essays, all of which examine the multi-dimensional knowledge diffusion networks in a specific context, but each focuses on a different aspect of knowledge diffusion. Chapter 2 focuses on how structural properties of networks affect various types of knowledge diffusion processes in the domain of commercial technology. The study uses ERGM to simultaneously model multiple types of knowledge flows and examine their interactions. The objective is to understand the impacts of network structures on knowledge diffusion processes. Chapter 3 focuses on examining the impact of individual attributes and the attributes of knowledge on knowledge diffusion in the context of scientific innovation. Based on social capital theory, the study also utilizes ERGM to examine how knowledge transfer and knowledge co-creation can be affected by the attributes of individual researchers and the attributes of scientific knowledge. Chapter 4 considers the dynamic aspect of knowledge diffusion and proposes a novel network model extending ERGM to identify dynamic patterns of knowledge diffusion in social media. In the proposed model, dynamic patterns in social media networks are modeled based on the nodal attributes of individuals and the temporal information of network ties.

APA, Harvard, Vancouver, ISO, and other styles

9

Lamont, Morné Michael Connell. "Binary classification trees : a comparison with popular classification methods in statistics using different software." Thesis, Stellenbosch : Stellenbosch University, 2002. http://hdl.handle.net/10019.1/52718.

Full text

Abstract:

Thesis (MComm) -- Stellenbosch University, 2002.
ENGLISH ABSTRACT: Consider a data set with a categorical response variable and a set of explanatory variables. The response variable can have two or more categories and the explanatory variables can be numerical or categorical. This is a typical setup for a classification analysis, where we want to model the response based on the explanatory variables. Traditional statistical methods have been developed under certain assumptions such as: the explanatory variables are numeric only and! or the data follow a multivariate normal distribution. hl practice such assumptions are not always met. Different research fields generate data that have a mixed structure (categorical and numeric) and researchers are often interested using all these data in the analysis. hl recent years robust methods such as classification trees have become the substitute for traditional statistical methods when the above assumptions are violated. Classification trees are not only an effective classification method, but offer many other advantages. The aim of this thesis is to highlight the advantages of classification trees. hl the chapters that follow, the theory of and further developments on classification trees are discussed. This forms the foundation for the CART software which is discussed in Chapter 5, as well as other software in which classification tree modeling is possible. We will compare classification trees to parametric-, kernel- and k-nearest-neighbour discriminant analyses. A neural network is also compared to classification trees and finally we draw some conclusions on classification trees and its comparisons with other methods.
AFRIKAANSE OPSOMMING: Beskou 'n datastel met 'n kategoriese respons veranderlike en 'n stel verklarende veranderlikes. Die respons veranderlike kan twee of meer kategorieë hê en die verklarende veranderlikes kan numeries of kategories wees. Hierdie is 'n tipiese opset vir 'n klassifikasie analise, waar ons die respons wil modelleer deur gebruik te maak van die verklarende veranderlikes. Tradisionele statistiese metodes is ontwikkelonder sekere aannames soos: die verklarende veranderlikes is slegs numeries en! of dat die data 'n meerveranderlike normaal verdeling het. In die praktyk word daar nie altyd voldoen aan hierdie aannames nie. Verskillende navorsingsvelde genereer data wat 'n gemengde struktuur het (kategories en numeries) en navorsers wil soms al hierdie data gebruik in die analise. In die afgelope jare het robuuste metodes soos klassifikasie bome die alternatief geword vir tradisionele statistiese metodes as daar nie aan bogenoemde aannames voldoen word nie. Klassifikasie bome is nie net 'n effektiewe klassifikasie metode nie, maar bied baie meer voordele. Die doel van hierdie werkstuk is om die voordele van klassifikasie bome uit te wys. In die hoofstukke wat volg word die teorie en verdere ontwikkelinge van klassifikasie bome bespreek. Hierdie vorm die fondament vir die CART sagteware wat bespreek word in Hoofstuk 5, asook ander sagteware waarin klassifikasie boom modelering moontlik is. Ons sal klassifikasie bome vergelyk met parametriese-, "kernel"- en "k-nearest-neighbour" diskriminant analise. 'n Neurale netwerk word ook vergelyk met klassifikasie bome en ten slote word daar gevolgtrekkings gemaak oor klassifikasie bome en hoe dit vergelyk met ander metodes.

APA, Harvard, Vancouver, ISO, and other styles

10

Noel, Jonathan A. "Extremal combinatorics, graph limits and computational complexity." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:8743ff27-b5e9-403a-a52a-3d6299792c7b.

Full text

Abstract:

This thesis is primarily focused on problems in extremal combinatorics, although we will also consider some questions of analytic and algorithmic nature. The d-dimensional hypercube is the graph with vertex set {0,1}^d where two vertices are adjacent if they differ in exactly one coordinate. In Chapter 2 we obtain an upper bound on the 'saturation number' of Q_m in Q_d. Specifically, we show that for m ≥ 2 fixed and d large there exists a subgraph G of Q_d of bounded average degree such that G does not contain a copy of Q_m but, for every G' such that G ⊊ G' ⊆ Q_d, the graph G' contains a copy of Q_m. This result answers a question of Johnson and Pinto and is best possible up to a factor of O(m). In Chapter 3, we show that there exists ε > 0 such that for all k and for n sufficiently large there is a collection of at most 2^(1-ε)k subsets of [n] which does not contain a chain of length k+1 under inclusion and is maximal subject to this property. This disproves a conjecture of Gerbner, Keszegh, Lemons, Palmer, Pálvölgyi and Patkós. We also prove that there exists a constant c ∈ (0,1) such that the smallest such collection is of cardinality 2^{(1+o(1))^ck} for all k. In Chapter 4, we obtain an exact expression for the 'weak saturation number' of Q_m in Q_d. That is, we determine the minimum number of edges in a spanning subgraph G of Q_d such that the edges of E(Q_d)\E(G) can be added to G, one edge at a time, such that each new edge completes a copy of Q_m. This answers another question of Johnson and Pinto. We also obtain a more general result for the weak saturation of 'axis aligned' copies of a multidimensional grid in a larger grid. In the r-neighbour bootstrap process, one begins with a set A₀ of 'infected' vertices in a graph G and, at each step, a 'healthy' vertex becomes infected if it has at least r infected neighbours. If every vertex of G is eventually infected, then we say that A₀ percolates. In Chapter 5, we apply ideas from weak saturation to prove that, for fixed r ≥ 2, every percolating set in Q_d has cardinality at least (1+o(1))(d choose r-1)/r. This confirms a conjecture of Balogh and Bollobás and is asymptotically best possible. In addition, we determine the minimum cardinality exactly in the case r=3 (the minimum cardinality in the case r=2 was already known). In Chapter 6, we provide a framework for proving lower bounds on the number of comparable pairs in a subset S of a partially ordered set (poset) of prescribed size. We apply this framework to obtain an explicit bound of this type for the poset 𝒱(q,n) consisting of all subspaces of 𝔽_qⁿordered by inclusion which is best possible when S is not too large. In Chapter 7, we apply the result from Chapter 6 along with the recently developed 'container method,' to obtain an upper bound on the number of antichains in 𝒱(q,n) and a bound on the size of the largest antichain in a p-random subset of 𝒱(q,n) which holds with high probability for p in a certain range. In Chapter 8, we construct a 'finitely forcible graphon' W for which there exists a sequence (ε_i)^∞_i=1 tending to zero such that, for all i ≥ 1, every weak ε_i-regular partition of W has at least exp(ε_i^-2/2^{5log∗ε_i^-2}) parts. This result shows that the structure of a finitely forcible graphon can be much more complex than was anticipated in a paper of Lovász and Szegedy. For positive integers p,q with p/q ❘≥ 2, a circular (p,q)-colouring of a graph G is a mapping V(G) → ℤ_p such that any two adjacent vertices are mapped to elements of ℤ_p at distance at least q from one another. The reconfiguration problem for circular colourings asks, given two (p,q)-colourings f and g of G, is it possible to transform f into g by recolouring one vertex at a time so that every intermediate mapping is a p,q-colouring? In Chapter 9, we show that this question can be answered in polynomial time for 2 ≤ p/q < 4 and is PSPACE-complete for p/q ≥ 4.

APA, Harvard, Vancouver, ISO, and other styles

11

Aragonès, Martín Àngels. "Graph theory applied to transmission path problems in vibroacoustics." Doctoral thesis, Universitat Ramon Llull, 2015. http://hdl.handle.net/10803/299378.

Full text

Abstract:

Un aspecte fonamental quan cal resoldre un problema vibroacústic en un sistema mecànic és el de determinar com flueix l’energia des d’una font donada, cap a qualsevol part del sistema. Això permet decidir quines són les accions a prendre per disminuir, per exemple, els nivells de soroll i vibracions en una determinada àrea del sistema. El comportament dinàmic d’un sistema mecànic es pot estimar utilitzant diversos mètodes numèrics, cadascun dels quals enfocat a un marge de freqüència determinat. Mentre a baixes freqüències es poden aplicar mètodes deterministes com el Mètode d’Elements Finits (FEM) o el Mètode d’Elements de Contorn (BEM), a altes freqüències, els mètodes estadístics com l’Anàlisi Estadística Energètica (SEA), esdevenen inevitables. A més a més, diverses tècniques com el FE-SEA híbrid, els models de Distribució Energètica (ED) o l’Anàlisi Estadística de distribució d’Energia modal (SmEdA), entre d’altres, han estat recentment plantejades per tal de tractar amb l’anomenat problema de les mitges freqüències. Tanmateix, encara que alguns mètodes numèrics poden predir la resposta vibroacústica puntual o amitjanada d’un sistema, aquests no proporcionen de forma directa informació sobre com flueix l’energia per tot el sistema. Per tant, cal algun tipus de post-processament per a determinar quines són les vies de transmissió d’energia. L’energia transmesa a través d’un determinat camí que connecti un subsistema font, on l’energia és introduïda, i un subsistema receptor, es pot calcular numèricament. Tot i això, la identificació dels camins que dominen la transmissió d’energia des d’una font fins a un receptor normalment acostuma a basar-se en l’experiència i el parer de l’enginyer. Així doncs, un mètode que permeti obtenir aquests camins de forma automàtica resultaria molt útil. La teoria de grafs proporciona una sortida a aquest problema, ja que existeixen diversos algorismes de càlcul de camins en grafs. En aquesta tesi, es proposa un enllaç entre els models vibroacústics i la teoria de grafs, que permet adreçar els problemes de vies de transmissió de forma directa. La dissertació comença centrant-se en els models SEA. Primerament, es mostra que té sentit realitzar una anàlisi de vies de transmissió (TPA) en SEA. Seguidament, es defineix un graf que representa de forma acurada els models SEA. Tenint en compte que la transmissió d’energia entre fonts i receptors es pot justificar mitjançant la contribució d’un grup finit de vies dominants en varis casos d’interès, es presenta un algorisme per calcular-les. A continuació, s’implementa un algorisme que inclou en el càlcul de camins la naturalesa estocàstica dels factors de pèrdues SEA. Tot seguit, es tracta com es pot estendre l’anàlisi de vies de transmissió al marge de la mitja freqüència. L’aplicació de la teoria de grafs a les mitges freqüències s’adapta per alguns models ED, així com també SmEdA. Finalment, es presenta una altra possible aplicació de la teoria de grafs en vibroacústica. S’implementa una estratègia basada en algorismes de talls en grafs per tal de reduir l’energia en un subsistema receptor amb la modificació d’un grup reduït de factors de pèrdues. Aquest grup de variacions, es troba calculant talls en el graf que separin els subsistemes fonts dels receptors.
A fundamental aspect when solving a vibroacoustic problem in a mechanical system is that of finding out how energy flows from a given source to any part of the system. This would help making decisions to undertake actions for diminishing, for example, the noise or vibration levels at a given system area. The dynamic behavior of a mechanical system can be estimated using different numerical methods, each of them targeting a certain frequency range. Whereas at low frequencies deterministic methods such as the Finite Element Method (FEM) or the Boundary Element Method (BEM) can be applied, statistical methods like Statistical Energy Analysis (SEA) become unavoidable at high frequencies. In addition, a large variety of approaches such as the hybrid FE-SEA, the Energy Distribution (ED) models or the Statistical modal Energy distribution Analysis (SmEdA), among many others, have been recently proposed to tackle with the so-called mid-frequency problem. However, although numerical methods can predict the pointwise or averaged vibroacoustic response of a system, they do not directly provide information on how energy flows throughout the system. Therefore, some kind of post-processing is required to determine energy transmission paths. The energy transmitted through a particular path linking a source subsystem, where external energy is being input, and a target subsystem, can be computed numerically. Yet, identifying which paths dominate the whole energy transmission from source to target usually relies on the engineer's expertise and judgement. Thus, an approach for the automatic identification of those paths would prove very useful. Graph theory provides a way out to this problem, since powerful path algorithms for graphs are available. In this thesis, a link between vibroacoustic models and graph theory is proposed, which allows one to address energy transmission path problems in a straightforward manner. The dissertation starts focusing on SEA models. It is first shown that performing a transmission path analysis (TPA) in SEA makes sense. Then a graph that accurately represents the SEA model is defined. Given that the energy transmission between sources and targets is justified by the contribution of a limited group of dominant paths in many cases of practical interest, an algorithm to find them is presented. Thereafter, an enhanced algorithm is devised to include the stochastic nature of SEA loss factors in the ranking of paths. Next, it is discussed how transmission path analysis can be extended to the mid frequency range. The graph approach for path computation becomes adapted for some ED models, as well as for SmEdA. Finally, we outline another possible application of graph theory to vibroacoustics. A graph cut algorithm strategy is implemented to achieve energy reduction at a target subsystem with the sole modification of a reduced set of loss factors. The set is found by computing cuts in the graph separating source and receiver subsystems.
Un aspecto fundamental a la hora de resolver un problema vibroacústico en un sistema mecánico es el de determinar cómo fluye la energía desde una determinada fuente hasta cualquier parte del sistema. Ello ayudaría a tomar decisiones para emprender acciones destinadas a disminuir, por ejemplo, los niveles de ruido y vibraciones en un área del sistema dada. El comportamiento dinámico de un sistema mecánico se puede estimar utilizando varios métodos numéricos, cada uno de ellos enfocado a un determinado rango de frecuencia. Mientras en las bajas frecuencias se pueden aplicar métodos deterministas como el Método de los Elementos Finitos (FEM) o el método de Elementos de Contorno (BEM), los métodos estadísticos como el Análisis Estadístico Energético son inevitables en las altas frecuencias. Además, se han desarrollado gran variedad de técnicas como el FE-SEA híbrido, los modelos de Distribución de Energía (ED) o el Análisis Estadístico de distribución de Energía modal (SmEdA), entre otras, para tratar el llamado problema de las medias frecuencias. Sin embargo, aunque los métodos numéricos pueden predecir la respuesta vibroacústica puntual o promediada de un sistema mecánico, ellos no proporcionan información sobre como fluye la energía en el sistema. Por lo tanto, hace falta algún tipo de post-procesado para determinar las vías de transmisión de energía. La energía transmitida a través de un determinado camino que conecta un subsistema fuente, donde se introduce la energía, y un subsistema receptor, se puede calcular numéricamente. A pesar de ello, identificar qué caminos dominan la transmisión de energía desde la fuente al receptor normalmente suele recaer en la experiencia o el juicio del ingeniero. Así pues, un método automático para identificar estos caminos resultaría muy útil. La teoría de grafos proporciona una solución a este problema, ya que existen potentes algoritmos de cálculos de caminos en grafos. En esta tesis, se propone un enlace entre los modelos vibroacústicos y la teoría de grafos, que permite abordar los problemas de vías de transmisión de forma directa. La disertación empieza centrándose en los modelos SEA. Primeramente, se muestra que tiene sentido realizar un análisis de vías de transmisión (TPA) en un modelo SEA. Seguidamente, se define un grafo que representa fielmente un modelo SEA. Teniendo en cuenta que en muchos casos de interés práctico, la transmisión de energía entre fuentes y receptores se puede justificar mediante la contribución de un grupo finito de vías de transmisión, se define un algoritmo para encontrarlas. A continuación, se implementa un algoritmo que incluye en el cómputo de caminos la naturaleza estocástica de los factores de pérdidas SEA. Luego, se trata la extensión del análisis de vías de transmisión al rango de media frecuencia. La técnica de teoría de grafos aplicada a cálculo de caminos se adapta para algunos modelos ED y también SmEdA. Finalmente, se presenta otra posible aplicación de la teoría de grafos a la vibroacústica. Se implementa una estrategia basada en algoritmos de cortes en grafos destinada a reducir la energía en un subsistema receptor mediante la simple modificación de un grupo reducido de factores de pérdidas. El grupo se encuentra calculando cortes que separen en el grafo los subsistemas fuentes de los subsistemas receptores.

APA, Harvard, Vancouver, ISO, and other styles

12

Maus, Aaron. "Formulation of Hybrid Knowledge-Based/Molecular Mechanics Potentials for Protein Structure Refinement and a Novel Graph Theoretical Protein Structure Comparison and Analysis Technique." ScholarWorks@UNO, 2019. https://scholarworks.uno.edu/td/2673.

Full text

Abstract:

Proteins are the fundamental machinery that enables the functions of life. It is critical to understand them not just for basic biology, but also to enable medical advances. The field of protein structure prediction is concerned with developing computational techniques to predict protein structure and function from a protein’s amino acid sequence, encoded for directly in DNA, alone. Despite much progress since the first computational models in the late 1960’s, techniques for the prediction of protein structure still cannot reliably produce structures of high enough accuracy to enable desired applications such as rational drug design. Protein structure refinement is the process of modifying a predicted model of a protein to bring it closer to its native state. In this dissertation a protein structure refinement technique, that of potential energy minimization using hybrid molecular mechanics/knowledge based potential energy functions is examined in detail. The generation of the knowledge-based component is critically analyzed, and in the end, a potential that is a modest improvement over the original is presented. This dissertation also examines the task of protein structure comparison. In evaluating various protein structure prediction techniques, it is crucial to be able to compare produced models against known structures to understand how well the technique performs. A novel technique is proposed that allows an in-depth yet intuitive evaluation of the local similarities between protein structures. Based on a graph analysis of pairwise atomic distance similarities, multiple regions of structural similarity can be identified between structures independently of relative orientation. Multidomain structures can be evaluated and this technique can be combined with global measures of similarity such as the global distance test. This method of comparison is expected to have broad applications in rational drug design, the evolutionary study of protein structures, and in the analysis of the protein structure prediction effort.

APA, Harvard, Vancouver, ISO, and other styles

13

Maroušek, Vít. "Vizualizace vícerozměrných statistických dat." Master's thesis, Vysoká škola ekonomická v Praze, 2011. http://www.nusl.cz/ntk/nusl-85161.

Full text

Abstract:

The thesis deals with the possibilities of visualization of multivariate statistical data. Since this is a very broad area the thesis is divided into four sections, two of which are theoretically and two practically oriented. The first section is devoted to theoretical aspects of data visualization. It contains information about the building blocks of graphs, and how the brain processes graphs in various stages of perception. The second section charts the available chart types that can be used to display data. Selected types of graphs for continuous and discontinuous multidimensional data are described in detail. The third section focuses on available software tools for creating graphs. The section describes several programs, with focus on STATISTICA, R and MS Excel. The knowledge gained in previous chapters was sufficient source of information to perform a graphical analysis of multidimensional continuous and discrete data and using advanced analytical methods in the last section. This analysis is performed separately on the data file with continuous variables and on a data file with discontinuous (categorical) variables.

APA, Harvard, Vancouver, ISO, and other styles

14

Holmgren, Åke J. "Quantitative vulnerability analysis of electric power networks." Doctoral thesis, KTH, Transporter och samhällsekonomi, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3969.

Full text

Abstract:

Disturbances in the supply of electric power can have serious implications for everyday life as well as for national (homeland) security. A power outage can be initiated by natural disasters, adverse weather, technical failures, human errors, sabotage, terrorism, and acts of war. The vulnerability of a system is described as a sensitivity to threats and hazards, and is measured by P (Q(t) > q), i.e. the probability of at least one disturbance with negative societal consequences Q larger than some critical value q, during a given period of time (0,t]. The aim of the thesis is to present methods for quantitative vulnerability analysis of electric power delivery networks to enable effective strategies for prevention, mitigation, response, and recovery to be developed. Paper I provides a framework for vulnerability assessment of infrastructure systems. The paper discusses concepts and perspectives for developing a methodology for vulnerability analysis, and gives examples related to power systems. Paper II analyzes the vulnerability of power delivery systems by means of statistical analysis of Swedish disturbance data. It is demonstrated that the size of large disturbances follows a power law, and that the occurrence of disturbances can be modeled as a Poisson process. Paper III models electric power delivery systems as graphs. Statistical measures for characterizing the structure of two empirical transmission systems are calculated, and a structural vulnerability analysis is performed, i.e. a study of the connectivity of the graph when vertices and edges are disabled. Paper IV discusses the origin of power laws in complex systems in terms of their structure and the dynamics of disturbance propagation. A branching process is used to model the structure of a power distribution system, and it is shown that the disturbance size in this analytical network model follows a power law. Paper V shows how the interaction between an antagonist and the defender of a power system can be modeled as a game. A numerical example is presented, and it is studied if there exists a dominant defense strategy, and if there is an optimal allocation of resources between protection of components, and recovery.
QC 20100831

APA, Harvard, Vancouver, ISO, and other styles

15

Karlström, Daniel. "Implementation of data-collection tools using NetFlow for statistical analysis at the ISP level." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-16140.

Full text

Abstract:

Defending against Dos- and DDoS attacks is difficult to accomplish; finding and filtering out illegitimate traffic from the legitimate flow is near impossible. Taking steps to mitigate or even block the traffic can only be done once the IP addresses of the attackers are known. This is achievable by monitoring the flows to- and from the target and identifying the attacker's IP addresses, allowing the company or their ISP to block the addresses itself by blackholing them (also known as a null route). Using the IP accounting and monitoring tool “pmacct”, this thesis aims to investigate whether or not the pmacct suite is suited for larger installations when tracking and mitigating DDoS-attacks, such at an Internet Service Provider (ISP). Potential problems are the amount of traffic that need to be analyzed and the computational power required to do it. This thesis also provide information about the pmacct suite at large. The conclusions are positive, indicating it does scale up to handle larger installations when given careful consideration and planning.
Att försvara sig mot DoS-och DDoS-attacker är svårt att åstadkomma; att hitta och filtrera ut illegitim trafik från det legitima flödet är nästan omöjligt. Att vidta åtgärder när en sådan attack upptäcks kan endast göras när IP-adresserna från angriparna är kända. Detta kan uppnås genom att man övervakar trafikflödet mellan målet för attacken och angriparna och ser vilka som sänder mest data och på så sätt identifierar angriparna.. Detta tillåter företaget eller dess ISP att blockera trafiken ifrån dessa IP-adresser genom att sända trafiken vidare till ingenstans. Detta kallas blackhole-routing eller null-routing. Genom att använda redovisnings- och övervakningsprogrammet pmacct syftar denna uppsats på att undersöka hurvida pmacct-sviten är lämpad för större installationer när det gäller att spåra och förhindra DDoS-attacker, såsom hos en Internetleverantör eller dylikt. Potentialla problem som kan uppstå är att mängden trafik som måste analyserar blir för stor och för krävande. Denna avhandling går även igenom pmacct-verktyget i sig. Slutsatserna är lovande, vilket indikerar att den har potential av att kunna hantera sådana stora miljöer med noggrann planering.

APA, Harvard, Vancouver, ISO, and other styles

16

Phadnis, Miti. "Statistical Analysis of Linear Analog Circuits Using Gaussian Message Passing in Factor Graphs." DigitalCommons@USU, 2009. https://digitalcommons.usu.edu/etd/504.

Full text

Abstract:

This thesis introduces a novel application of factor graphs to the domain of analog circuits. It proposes a technique of leveraging factor graphs for performing statistical yield analysis of analog circuits that is much faster than the standard Monte Carlo/Simulation Program With Integrated Circuit Emphasis (SPICE) simulation techniques. We have designed a tool chain to model an analog circuit and its corresponding factor graph and then use a Gaussian message passing approach along the edges of the graph for yield calculation. The tool is also capable of estimating unknown parameters of the circuit given known output statistics through backward message propagation in the factor graph. The tool builds upon the concept of domain-specific modeling leveraged for modeling and interpreting different kinds of analog circuits. Generic Modeling Environment (GME) is used to design modeling environment for analog circuits. It is a configurable tool set that supports creation of domain-specific design environments for different applications. This research has developed a generalized methodology that could be applied towards design automation of different kinds of analog circuits, both linear and nonlinear. The tool has been successfully used to model linear amplifier circuits and a nonlinear Metal Oxide Semiconductor Field Effect Transistor (MOSFET) circuit. The results obtained by Monte Carlo simulations performed on these circuits are used as a reference in the project to compare against the tool's results. The tool is tested for its efficiency in terms of time and accuracy against the standard results.

APA, Harvard, Vancouver, ISO, and other styles

17

Vohra, Neeru Rani. "Three dimensional statistical graphs, visual cues and clustering." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ56213.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Zoffoli, Violetta <1992&gt. "Multiple Graph Structure Learning: a comparative analysis." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amsdottorato.unibo.it/9400/1/tesi_finale.pdf.

Full text

Abstract:

In the context of analysing multivariate Gaussian distributions under different experimental conditions, recent studies have focused on retrieving the patterns of the conditional independences between pairs of variables for each condition. Given the representation of non-zero partial correlations as edges in a graph, we refer to this domain as Multiple Graph Structure Learning. In application problems that assume some similarity between the graph structures, it has been suggested in the literature that learning the graphs jointly would be advantageous with respect to learning them separately. As an alternative, the graphs can be learnt directly from the difference of the concentration matrices. The aim of this thesis is to understand the advantages and limitations of such learning methods. In order to do so, we compare these strategies by constructing a comprehensive and detailed simulation study analysis that includes different graph structures, different sample sizes, different dimensions and different levels of similarity between the experimental conditions. We evaluate the performance of the methods using the precision and recall indexes. From the results of our simulation, it is evident that the underlying limitation of all the graph structure learning methods resides in the model selection, which corresponds to the choice of l1-norm penalty terms. This leads to the identification of graphs with highly variable densities, which hinders the method comparison. We then impose that the models reproduce the true graph densities and we explore how different the resulting graphs are with respect to each learning method and simulation scenario.

APA, Harvard, Vancouver, ISO, and other styles

19

Albà, Xènia. "Automated cardiac MR image analysis for population imaging." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/403063.

Full text

Abstract:

La pràctica clínica i la investigació generen grans quantitats de registres mèdics, incloent-hi imatges. No obstant això, molta informació que podria millorar l'assistència sanitària continua sent inaccessible. Es necessiten noves eines per processar dades a gran escala que tinguin en compte la variabilitat en l'anatomia i la fisiopatologia. En aquesta tesi, es presenten nous procediments per al tractament automàtic i eficient de dades mèdiques a gran escala, concretament en la segmentació d'imatges cardíaques de ressonància magnètica (RM). Les principals aportacions d'aquesta tesi permeten la segmentació automàtica (i) de múltiples seqüències de RM sense necessitat d'ajustar cap paràmetre, (ii) de casos altament variables sense un coneixement previ de la patologia involucrada, i (iii) incorporant la detecció automàtica i un control de qualitat sense necessitat de cap intervenció per part de l'usuari. Totes aquestes tècniques s’han avaluat utilitzant múltiples cohorts a gran escala de diferents centres clínics i bases de dades públiques.
Clinical practice and research are routinely generating large amounts of medical records, including medical images. However, valuable knowledge that could impact healthcare delivery remains currently frozen in these population cohorts. New tools are therefore necessary to process and exploit such large-scale data, taking into account in particular the unprecedented variability in anatomy and pathophysiology. In this thesis, we present new approaches for the automatic and robust processing of large-scale medical image data, focusing on the challenging segmentation of cardiac magnetic resonance images (MRI) studies. The main contributions of this thesis allow automatic segmentation (i) across multiple MRI sequences without the need for sequence-specific parameter tuning, (ii) across highly variable cases without a priori knowledge of the involved pathology, and (iii) incorporating automatic detection and quality control without the need for any user interaction. All of these techniques are demonstrated over multiple large-scale cohorts from different clinical centers and public databases.

APA, Harvard, Vancouver, ISO, and other styles

20

Wang, Kaijun. "Graph-based Modern Nonparametrics For High-dimensional Data." Diss., Temple University Libraries, 2019. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/578840.

Full text

Abstract:

Statistics
Ph.D.
Developing nonparametric statistical methods and inference procedures for high-dimensional large data have been a challenging frontier problem of statistics. To attack this problem, in recent years, a clear rising trend has been observed with a radically different viewpoint--``Graph-based Nonparametrics," which is the main research focus of this dissertation. The basic idea consists of two steps: (i) representation step: code the given data using graphs, (ii) analysis step: apply statistical methods on the graph-transformed problem to systematically tackle various types of data structures. Under this general framework, this dissertation develops two major research directions. Chapter 2—based on Mukhopadhyay and Wang (2019a)—introduces a new nonparametric method for high-dimensional k-sample comparison problem that is distribution-free, robust, and continues to work even when the dimension of the data is larger than the sample size. The proposed theory is based on modern LP-nonparametrics tools and unexplored connections with spectral graph theory. The key is to construct a specially-designed weighted graph from the data and to reformulate the k-sample problem into a community detection problem. The procedure is shown to possess various desirable properties along with a characteristic exploratory flavor that has practical consequences. The numerical examples show surprisingly well performance of our method under a broad range of realistic situations. Chapter 3—based on Mukhopadhyay and Wang (2019b)—revisits some foundational questions about network modeling that are still unsolved. In particular, we present unified statistical theory of the fundamental spectral graph methods (e.g., Laplacian, Modularity, Diffusion map, regularized Laplacian, Google PageRank model), which are often viewed as spectral heuristic-based empirical mystery facts. Despite half a century of research, this question has been one of the most formidable open issues, if not the core problem in modern network science. Our approach integrates modern nonparametric statistics, mathematical approximation theory (of integral equations), and computational harmonic analysis in a novel way to develop a theory that unifies and generalizes the existing paradigm. From a practical standpoint, it is shown that this perspective can provide adequate guidance for designing next-generation computational tools for large-scale problems. As an example, we have described the high-dimensional change-point detection problem. Chapter 4 discusses some further extensions and application of our methodologies to regularized spectral clustering and spatial graph regression problems. The dissertation concludes with the a discussion of two important areas of future studies.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

21

Blignaut, Rennette Julia. "Discriminant analysis : a review of its application to the classificationof grape cultivars." Master's thesis, University of Cape Town, 1989. http://hdl.handle.net/11427/14298.

Full text

Abstract:

The aim of this study was to calculate a classification function for discriminating between five grape cultivars with a view to determine the cultivar of an unknown grape juice. In order to discriminate between the five grape cultivars various multivariate statistical techniques, such as principal component analysis, cluster analysis, correspondence analysis and discriminant analysis were applied. Discriminant analysis resulted in the most appropriate technique for the problem at hand and therefore an in depth study of this technique was undertaken. Discriminant analysis was the most appropriate technique for classifying these grape samples into distinct cultivars because this technique utilized prior information of population membership. This thesis is divided into two main sections. The first section (chapters 1 to 5) is a review on discriminant analysis, describing various aspects of this technique and matters related thereto. In the second section (chapter 6) the theories discussed in the first section are applied to the problem at hand. The results obtained when discriminating between the different grape cultivars are given. Chapter 1 gives a general introduction to the subject of discriminant analysis, including certain basic derivations used in this study. Two approaches to discriminant analysis are discussed in Chapter 2, namely the parametrical and non-parametrical approaches. In this review the emphasis is placed on the classical approach to discriminant analysis. Non-parametrical approaches such as the K-nearest neighbour technique, the kernel method and ranking are briefly discussed. Chapter 3 deals with estimating the probability of misclassification. In Chapter 4 variable selection techniques are discussed. Chapter 5 briefly deals with sequential and logistical discrimination techniques. The estimation of missing values is also discussed in this chapter. A final summary and conclusion is given in Chapter 7. Appendices A to D illustrate some of the obtained results from the practical analyses.

APA, Harvard, Vancouver, ISO, and other styles

22

Mei, Jonathan B. "Principal Network Analysis." Research Showcase @ CMU, 2018. http://repository.cmu.edu/dissertations/1175.

Full text

Abstract:

Many applications collect a large number of time series, for example, temperature continuously monitored by weather stations across the US or neural activity recorded by an array of electrical probes. These data are often referred to as unstructured. A first task in their analytics is often to derive a low dimensional representation { a graph or discrete manifold { that describes the inter relations among the time series and their intrarelations across time. In general, the underlying graphs can be directed and weighted, possibly capturing the strengths of causal relations, not just the binary existence of reciprocal correlations. Furthermore, the processes generating the data may be non-linear and observed in the presence of unmodeled phenomena or unmeasured agents in a complex networked system. Finally, the networks describing the processes may themselves vary through time. In many scenarios, there may be good reasons to believe that the graphs are only able to vary as linear combinations of a set of \principal graphs" that are fundamental to the system. We would then be able to characterize each principal network individually to make sense of the ensemble and analyze the behaviors of the interacting entities. This thesis acts as a roadmap of computationally tractable approaches for learning graphs that provide structure to data. It culminates in a framework that addresses these challenges when estimating time-varying graphs from collections of time series. Analyses are carried out to justify the various models proposed along the way and to characterize their performance. Experiments are performed on synthetic and real datasets to highlight their effectiveness and to illustrate their limitations.

APA, Harvard, Vancouver, ISO, and other styles

23

Young, Stephen J. "Random dot product graphs a flexible model for complex networks." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26548.

Full text

Abstract:

Thesis (Ph.D)--Mathematics, Georgia Institute of Technology, 2009.
Committee Chair: Mihail, Milena; Committee Member: Lu, Linyuan; Committee Member: Sokol, Joel; Committee Member: Tetali, Prasad; Committee Member: Trotter, Tom; Committee Member: Yu, Xingxing. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

24

ARTARIA, ANDREA. "Objective Bayesian Analysis for Differential Gaussian Directed Acyclic Graphs." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2014. http://hdl.handle.net/10281/55327.

Full text

Abstract:

Often we are confronted with heterogeneous multivariate data, i.e., data coming from several categories, and the interest may center on the differential structure of stochastic dependence among the variables between the groups. The focus in this work is on the two groups problem and is faced modeling the system through a Gaussian directed acyclic graph (DAG) couple linked in a fashion to obtain a joint estimation in order to exploit, whenever they exist, similarities between the graphs. The model can be viewed as a set of separate regressions and the proposal consists in assigning a non-local prior to the regression coefficients with the objective of enforcing stronger sparsity constraints on model selection. The model selection is based on Moment Fractional Bayes Factor, and is performed through a stochastic search algorithm over the space of DAG models.

APA, Harvard, Vancouver, ISO, and other styles

25

Fadrný, Tomáš. "Statistické zhodnocení dat." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2009. http://www.nusl.cz/ntk/nusl-228740.

Full text

Abstract:

This diploma thesis evaluates and processes data from final device checks. All the devices are similar types of thermal overcurrent relays by the ABB company. For appropriate statistical data processing, the Minitab 14 statistical software was used and various statistical methods were applied. Results are always listed for each device type and each method used. The diploma thesis is divided into two parts. The first one analyzes the methods used and the second part states the method results. There is also an overall evaluation of the processed data.

APA, Harvard, Vancouver, ISO, and other styles

26

Kim, Sungmin. "Community Detection in Directed Networks and its Application to Analysis of Social Networks." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1397571499.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Psorakis, Ioannis. "Probabilistic inference in ecological networks : graph discovery, community detection and modelling dynamic sociality." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:84741d8b-31ea-4eee-ae44-a0b7b5491700.

Full text

Abstract:

This thesis proposes a collection of analytical and computational methods for inferring an underlying social structure of a given population, observed only via timestamped occurrences of its members across a range of locations. It shows that such data streams have a modular and temporally-focused structure, neither fully ordered nor completely random, with individuals appearing in "gathering events". By exploiting such structure, the thesis proposes an appropriate mapping of those spatio-temporal data streams to a social network, based on the co-occurrences of agents across gathering events, while capturing the uncertainty over social ties via the use of probability distributions. Given the extracted graphs mentioned above, an approach is proposed for studying their community organisation. The method considers communities as explanatory variables for the observed interactions, producing overlapping partitions and node membership scores to groups. The aforementioned models are motivated by a large ongoing experiment at Wytham woods, Oxford, where a population of Parus major wild birds is tagged with RFID devices and a grid of feeding locations generates thousands of spatio-temporal records each year. The methods proposed are applied on such data set to demonstrate how they can be used to explore wild bird sociality, reveal its internal organisation across a variety of different scales and provide insights into important biological processes relating to mating pair formation.

APA, Harvard, Vancouver, ISO, and other styles

28

Maignant, Elodie. "Plongements barycentriques pour l'apprentissage géométrique de variétés : application aux formes et graphes." Electronic Thesis or Diss., Université Côte d'Azur, 2023. http://www.theses.fr/2023COAZ4096.

Full text

Abstract:

Une image obtenue par IRM, c'est plus de 60 000 pixels. La plus grosse protéine connue chez l'être humain est constituée d'environ 30 000 acides aminés. On parle de données en grande dimension. En réalité, la plupart des données en grande dimension ne le sont qu'en apparence. Par exemple, de toutes les images que l'on pourrait générer aléatoirement en coloriant 256 x 256 pixels, seule une infime proportion ressemblerait à l'image IRM d'un cerveau humain. C'est ce qu'on appelle la dimension intrinsèque des données. En grande dimension, apprentissage rime donc souvent avec réduction de dimension. Il existe de nombreuses méthodes de réduction de dimension, les plus récentes pouvant être classées selon deux approches.Une première approche, connue sous le nom d'apprentissage de variétés (manifold learning) ou réduction de dimension non linéaire, part du constat que certaines lois physiques derrière les données que l'on observe ne sont pas linéaires. Ainsi, espérer expliquer la dimension intrinsèque des données par un modèle linéaire est donc parfois irréaliste. Au lieu de cela, les méthodes qui relèvent du manifold learning supposent un modèle localement linéaire.D'autre part, avec l'émergence du domaine de l'analyse statistique de formes, il y eu une prise de conscience que de nombreuses données sont naturellement invariantes à certaines symétries (rotations, permutations, reparamétrisations...), invariances qui se reflètent directement sur la dimension intrinsèque des données. Ces invariances, la géométrie euclidienne ne peut pas les retranscrire fidèlement. Ainsi, on observe un intérêt croissant pour la modélisation des données par des structures plus fines telles que les variétés riemanniennes. Une deuxième approche en réduction de dimension consiste donc à généraliser les méthodes existantes à des données à valeurs dans des espaces non-euclidiens. On parle alors d'apprentissage géométrique. Jusqu'à présent, la plupart des travaux en apprentissage géométrique se sont focalisés sur l'analyse en composantes principales.Dans la perspective de proposer une approche qui combine à la fois apprentissage géométrique et manifold learning, nous nous sommes intéressés à la méthode appelée locally linear embedding, qui a la particularité de reposer sur la notion de barycentre, notion a priori définie dans les espaces euclidiens mais qui se généralise aux variétés riemanniennes. C'est d'ailleurs sur cette même notion que repose une autre méthode appelée barycentric subspace analysis, et qui fait justement partie des méthodes qui généralisent l'analyse en composantes principales aux variétés riemanniennes. Ici, nous introduisons la notion nouvelle de plongement barycentrique, qui regroupe les deux méthodes. Essentiellement, cette notion englobe un ensemble de méthodes dont la structure rappelle celle des méthodes de réduction de dimension linéaires et non linéaires, mais où le modèle (localement) linéaire est remplacé par un modèle barycentrique -- affine.Le cœur de notre travail consiste en l'analyse de ces méthodes, tant sur le plan théorique que pratique. Du côté des applications, nous nous intéressons à deux exemples importants en apprentissage géométrique : les formes et les graphes. En particulier, on démontre que par rapport aux méthodes standard de réduction de dimension en analyse statistique des graphes, les plongements barycentriques se distinguent par leur meilleure interprétabilité. En plus des questions pratiques liées à l'implémentation, chacun de ces exemples soulève ses propres questions théoriques, principalement autour de la géométrie des espaces quotients. Parallèlement, nous nous attachons à caractériser géométriquement les plongements localement barycentriques, qui généralisent la projection calculée par locally linear embedding. Enfin, de nouveaux algorithmes d'apprentissage géométrique, novateurs dans leur approche, complètent ce travail
An MRI image has over 60,000 pixels. The largest known human protein consists of around 30,000 amino acids. We call such data high-dimensional. In practice, most high-dimensional data is high-dimensional only artificially. For example, of all the images that could be randomly generated by coloring 256 x 256 pixels, only a very small subset would resemble an MRI image of a human brain. This is known as the intrinsic dimension of such data. Therefore, learning high-dimensional data is often synonymous with dimensionality reduction. There are numerous methods for reducing the dimension of a dataset, the most recent of which can be classified according to two approaches.A first approach known as manifold learning or non-linear dimensionality reduction is based on the observation that some of the physical laws behind the data we observe are non-linear. In this case, trying to explain the intrinsic dimension of a dataset with a linear model is sometimes unrealistic. Instead, manifold learning methods assume a locally linear model.Moreover, with the emergence of statistical shape analysis, there has been a growing awareness that many types of data are naturally invariant to certain symmetries (rotations, reparametrizations, permutations...). Such properties are directly mirrored in the intrinsic dimension of such data. These invariances cannot be faithfully transcribed by Euclidean geometry. There is therefore a growing interest in modeling such data using finer structures such as Riemannian manifolds. A second recent approach to dimension reduction consists then in generalizing existing methods to non-Euclidean data. This is known as geometric learning.In order to combine both geometric learning and manifold learning, we investigated the method called locally linear embedding, which has the specificity of being based on the notion of barycenter, a notion a priori defined in Euclidean spaces but which generalizes to Riemannian manifolds. In fact, the method called barycentric subspace analysis, which is one of those generalizing principal component analysis to Riemannian manifolds, is based on this notion as well. Here we rephrase both methods under the new notion of barycentric embeddings. Essentially, barycentric embeddings inherit the structure of most linear and non-linear dimension reduction methods, but rely on a (locally) barycentric -- affine -- model rather than a linear one.The core of our work lies in the analysis of these methods, both on a theoretical and practical level. In particular, we address the application of barycentric embeddings to two important examples in geometric learning: shapes and graphs. In addition to practical implementation issues, each of these examples raises its own theoretical questions, mostly related to the geometry of quotient spaces. In particular, we highlight that compared to standard dimension reduction methods in graph analysis, barycentric embeddings stand out for their better interpretability. In parallel with these examples, we characterize the geometry of locally barycentric embeddings, which generalize the projection computed by locally linear embedding. Finally, algorithms for geometric manifold learning, novel in their approach, complete this work

APA, Harvard, Vancouver, ISO, and other styles

29

Araújo, Eduardo Barbosa. "Scientific Collaboration Networks from Lattes Database: Topology, Dynamics and Gender Statistics." reponame:Repositório Institucional da UFC, 2016. http://www.repositorio.ufc.br/handle/riufc/18489.

Full text

Abstract:

ARAÚJO, Eduardo Barbosa. Scientific Collaboration Networks from Lattes Database: Topology, Dynamics and Gender Statistics. 2016. 88 f. Tese (Doutorado em Física) - Programa de Pós-Graduação em Física, Departamento de Física, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, 2016.
Submitted by Edvander Pires (edvanderpires@gmail.com) on 2016-07-19T15:58:54Z No. of bitstreams: 1 2016_tese_ebaraujo.pdf: 3600069 bytes, checksum: a78e83ffda97c499e589b405da4da3c8 (MD5)
Approved for entry into archive by Edvander Pires (edvanderpires@gmail.com) on 2016-07-19T15:59:07Z (GMT) No. of bitstreams: 1 2016_tese_ebaraujo.pdf: 3600069 bytes, checksum: a78e83ffda97c499e589b405da4da3c8 (MD5)
Made available in DSpace on 2016-07-19T15:59:07Z (GMT). No. of bitstreams: 1 2016_tese_ebaraujo.pdf: 3600069 bytes, checksum: a78e83ffda97c499e589b405da4da3c8 (MD5) Previous issue date: 2016
Understanding the dynamics of research production and collaboration may reveal better strategies for scientific careers, academic institutions and funding agencies. Here we propose the use of a large and multidisciplinary database of scientific curricula in Brazil, namely, the Lattes Platform, to study patterns of scientific production and collaboration. Detailed information about publications and researchers is available in this database. Individual curricula are submitted by the researchers themselves so that co-authorship is unambiguous. Researchers can be evaluated by scientific productivity, geographical location and field of expertise. Our results show that the collaboration network is growing exponentially for the last three decades, with a distribution of number of collaborators per researcher that approaches a power-law as the network gets older. Moreover, both the distributions of number of collaborators and production per researcher obey power-law behaviors, regardless of the geographical location or field, suggesting that the same universal mechanism might be responsible for network growth and productivity. We also show that the collaboration network under investigation displays a typical assortative mixing behavior, where teeming researchers (i.e., with high degree) tend to collaborate with others alike. Moreover, we discover that on average men prefer collaborating with other men than with women, while women are more egalitarian. This is consistently observed over all fields and essentially independent on the number of collaborators of the researcher. The solely exception is for engineering, where clearly this gender bias is less pronounced, when the number of collaborators increases. We also find that the distribution of number of collaborators follows a power-law, with a cut-off that is gender dependent. This reflects the fact that on average men produce more papers andhave more collaborators than women. We also find that both genders display the same tendency towards interdisciplinary collaborations, except for Exact and Earth Sciences, where women having many collaborators are more open to interdisciplinary research.
Compreender a dinâmica de produção e colaboração em pesquisa pode revelar melhores estratégias para carreiras científicas, instituições acadêmicas e agências de fomento. Neste trabalho nós propomos o uso de uma grande e multidisciplinar base de currículos científicos brasileira, a Plataforma Lattes, para o estudo de padrões em pesquisa científica e colaborações. Esta base de dados inclui informações detalhadas acerca de publicações e pesquisadores. Currículos individuais são enviados pelos próprios pesquisadores de forma que a identificação de coautoria não é ambígua. Pesquisadores podem ser classificados por produção científica, localização geográfica e áreas de pesquisa. Nossos resultados mostram que a rede de colaborações científicas tem crescido exponencialmente nas últimas três décadas, com a distribuição do número de colaboradores por pesquisador se aproximando de uma lei de potência à medida que a rede evolui. Além disso, ambas a distribuição do número de colaboradores e a produção por pesquisador seguem o comportamento de leis de potência, independentemente da região ou áreas, sugerindo que um mesmo mecanismo universal pode ser responsável pelo crescimento da rede e pela produtividade dos pesquisadores. Também mostramos que as redes de colaboração investigadas apresentam um típico comportamento assortativo, no qual pesquisadores de alto nível (com muitos colaboradores) tendem a colaborador com outros semelhantes. Em seguida, mostramos que homens preferem colaborar com outros homens enquanto mulheres são mais igualitárias ao estabelecer suas colaborações. Isso é consistentemente observado em todas as áreas e é essencialmente independente do número de colaborações do pesquisador. A única exceção sendo a área de Engenharia, na qual este viés é claramente menos pronunciado para pesquisadores com muitas colaborações. Também mostramos que o número de colaborações segue o comportamento de leis de potência, com um cutoff dependente do gênero. Isso se reflete no fato de que em média mulheres produzem menos artigos e têm menos colaborações que homens. Também mostramos que ambos os gêneros exibem a mesma tendência quanto a colaborações interdisciplinares, exceto em Ciências Exatas e da Terra, nas quais mulheres tendo mais colaboradores são mais propensas a pesquisas interdisciplinares.

APA, Harvard, Vancouver, ISO, and other styles

30

AraÃjo, Eduardo Barbosa. "Scientific Collaboration Networks from Lattes Database: Topology, Dynamics and Gender Statistics." Universidade Federal do CearÃ, 2016. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=17184.

Full text

Abstract:

Conselho Nacional de Desenvolvimento CientÃfico e TecnolÃgico
Compreender a dinÃmica de produÃÃo e colaboraÃÃo em pesquisa pode revelar melhores estratÃgias para carreiras cientÃficas, instituiÃÃes acadÃmicas e agÃncias de fomento. Neste trabalho nÃs propomos o uso de uma grande e multidisciplinar base de currÃculos cientÃficos brasileira, a Plataforma Lattes, para o estudo de padrÃes em pesquisa cientÃfica e colaboraÃÃes. Esta base de dados inclui informaÃÃes detalhadas acerca de publicaÃÃes e pesquisadores. CurrÃculos individuais sÃo enviados pelos prÃprios pesquisadores de forma que a identificaÃÃo de coautoria nÃo Ã ambÃgua. Pesquisadores podem ser classificados por produÃÃo cientÃfica, localizaÃÃo geogrÃfica e Ãreas de pesquisa. Nossos resultados mostram que a rede de colaboraÃÃes cientÃficas tem crescido exponencialmente nas Ãltimas trÃs dÃcadas, com a distribuiÃÃo do nÃmero de colaboradores por pesquisador se aproximando de uma lei de potÃncia Ã medida que a rede evolui. AlÃm disso, ambas a distribuiÃÃo do nÃmero de colaboradores e a produÃÃo por pesquisador seguem o comportamento de leis de potÃncia, independentemente da regiÃo ou Ãreas, sugerindo que um mesmo mecanismo universal pode ser responsÃvel pelo crescimento da rede e pela produtividade dos pesquisadores. TambÃm mostramos que as redes de colaboraÃÃo investigadas apresentam um tÃpico comportamento assortativo, no qual pesquisadores de alto nÃvel (com muitos colaboradores) tendem a colaborador com outros semelhantes. Em seguida, mostramos que homens preferem colaborar com outros homens enquanto mulheres sÃo mais igualitÃrias ao estabelecer suas colaboraÃÃes. Isso Ã consistentemente observado em todas as Ãreas e Ã essencialmente independente do nÃmero de colaboraÃÃes do pesquisador. A Ãnica exceÃÃo sendo a Ãrea de Engenharia, na qual este viÃs Ã claramente menos pronunciado para pesquisadores com muitas colaboraÃÃes. TambÃm mostramos que o nÃmero de colaboraÃÃes segue o comportamento de leis de potÃncia, com um cutoff dependente do gÃnero. Isso se reflete no fato de que em mÃdia mulheres produzem menos artigos e tÃm menos colaboraÃÃes que homens. TambÃm mostramos que ambos os gÃneros exibem a mesma tendÃncia quanto a colaboraÃÃes interdisciplinares, exceto em CiÃncias Exatas e da Terra, nas quais mulheres tendo mais colaboradores sÃo mais propensas a pesquisas interdisciplinares.
Understanding the dynamics of research production and collaboration may reveal better strategies for scientific careers, academic institutions and funding agencies. Here we propose the use of a large and multidisciplinary database of scientific curricula in Brazil, namely, the Lattes Platform, to study patterns of scientific production and collaboration. Detailed information about publications and researchers is available in this database. Individual curricula are submitted by the researchers themselves so that co-authorship is unambiguous. Researchers can be evaluated by scientific productivity, geographical location and field of expertise. Our results show that the collaboration network is growing exponentially for the last three decades, with a distribution of number of collaborators per researcher that approaches a power-law as the network gets older. Moreover, both the distributions of number of collaborators and production per researcher obey power-law behaviors, regardless of the geographical location or field, suggesting that the same universal mechanism might be responsible for network growth and productivity. We also show that the collaboration network under investigation displays a typical assortative mixing behavior, where teeming researchers (i.e., with high degree) tend to collaborate with others alike. Moreover, we discover that on average men prefer collaborating with other men than with women, while women are more egalitarian. This is consistently observed over all fields and essentially independent on the number of collaborators of the researcher. The solely exception is for engineering, where clearly this gender bias is less pronounced, when the number of collaborators increases. We also find that the distribution of number of collaborators follows a power-law, with a cut-off that is gender dependent. This reflects the fact that on average men produce more papers andhave more collaborators than women. We also find that both genders display the same tendency towards interdisciplinary collaborations, except for Exact and Earth Sciences, where women having many collaborators are more open to interdisciplinary research.

APA, Harvard, Vancouver, ISO, and other styles

31

Li, Xiaohu. "Security Analysis on Network Systems Based on Some Stochastic Models." ScholarWorks@UNO, 2014. http://scholarworks.uno.edu/td/1931.

Full text

Abstract:

Due to great effort from mathematicians, physicists and computer scientists, network science has attained rapid development during the past decades. However, because of the complexity, most researches in this area are conducted only based upon experiments and simulations, it is critical to do research based on theoretical results so as to gain more insight on how the structure of a network affects the security. This dissertation introduces some stochastic and statistical models on certain networks and uses a k-out-of-n tolerant structure to characterize both logically and physically the behavior of nodes. Based upon these models, we draw several illuminating results in the following two aspects, which are consistent with what computer scientists have observed in either practical situations or experimental studies. Suppose that the node in a P2P network loses the designed function or service when some of its neighbors are disconnected. By studying the isolation probability and the durable time of a single user, we prove that the network with the user's lifetime having more NWUE-ness is more resilient in the sense of having a smaller probability to be isolated by neighbors and longer time to be online without being interrupted. Meanwhile, some preservation properties are also studied for the durable time of a network. Additionally, in order to apply the model in practice, both graphical and nonparametric statistical methods are developed and are employed to a real data set. On the other hand, a stochastic model is introduced to investigate the security of network systems based on their vulnerability graph abstractions. A node loses its designed function when certain number of its neighbors are compromised in the sense of being taken over by the malicious codes or the hacker. The attack compromises some nodes, and the victimized nodes become accomplices. We derived an equation to solve the probability for a node to be compromised in a network. Since this equation has no explicit solution, we also established new lower and upper bounds for the probability. The two models proposed herewith generalize existing models in the literature, the corresponding theoretical results effectively improve those known results and hence carry an insight on designing a more secure system and enhancing the security of an existing system.

APA, Harvard, Vancouver, ISO, and other styles

32

Herman, Joseph L. "Multiple sequence analysis in the presence of alignment uncertainty." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:88a56d9f-a96e-48e3-b8dc-a73f3efc8472.

Full text

Abstract:

Sequence alignment is one of the most intensely studied problems in bioinformatics, and is an important step in a wide range of analyses. An issue that has gained much attention in recent years is the fact that downstream analyses are often highly sensitive to the specific choice of alignment. One way to address this is to jointly sample alignments along with other parameters of interest. In order to extend the range of applicability of this approach, the first chapter of this thesis introduces a probabilistic evolutionary model for protein structures on a phylogenetic tree; since protein structures typically diverge much more slowly than sequences, this allows for more reliable detection of remote homologies, improving the accuracy of the resulting alignments and trees, and reducing sensitivity of the results to the choice of dataset. In order to carry out inference under such a model, a number of new Markov chain Monte Carlo approaches are developed, allowing for more efficient convergence and mixing on the high-dimensional parameter space. The second part of the thesis presents a directed acyclic graph (DAG)-based approach for representing a collection of sampled alignments. This DAG representation allows the initial collection of samples to be used to generate a larger set of alignments under the same approximate distribution, enabling posterior alignment probabilities to be estimated reliably from a reasonable number of samples. If desired, summary alignments can then be generated as maximum-weight paths through the DAG, under various types of loss or scoring functions. The acyclic nature of the graph also permits various other types of algorithms to be easily adapted to operate on the entire set of alignments in the DAG. In the final part of this work, methodology is introduced for alignment-DAG-based sequence annotation using hidden Markov models, and RNA secondary structure prediction using stochastic context-free grammars. Results on test datasets indicate that the additional information contained within the DAG allows for improved predictions, resulting in substantial gains over simply analysing a set of alignments one by one.

APA, Harvard, Vancouver, ISO, and other styles

33

Fockstedt, Jonas, and Ema Krcic. "Unsupervised anomaly detection for structured data - Finding similarities between retail products." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-44756.

Full text

Abstract:

Data is one of the most contributing factors for modern business operations. Having bad data could therefore lead to tremendous losses, both financially and for customer experience. This thesis seeks to find anomalies in real-world, complex, structured data, causing an international enterprise to miss out on income and the potential loss of customers. By using graph theory and similarity analysis, the findings suggest that certain countries contribute to the discrepancies more than other countries. This is believed to be an effect of countries customizing their products to match the market’s needs. This thesis is just scratching the surface of the analysis of the data, and the number of opportunities for future work are therefore many.

APA, Harvard, Vancouver, ISO, and other styles

34

Koskinen, Johan. "Essays on Bayesian Inference for Social Networks." Doctoral thesis, Stockholm : Department of Statistics [Statistiska institutionen], Univ, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-128.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Martinet, Lucie. "Réseaux dynamiques de terrain : caractérisation et propriétés de diffusion en milieu hospitalier." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1010/document.

Full text

Abstract:

Durant cette thèse, nous nous sommes intéressés aux outils permettant d'extraire les propriétés structurelles et temporelles de réseaux dynamiques ainsi que les caractéristiques de certains scénarios de diffusion pouvant s'opérer sur ces réseaux. Nous avons travaillé sur un jeu de données spécifiques, issu du projet MOSAR, qui comporte entre autre le réseau de proximité des personnes au cours du temps durant 6 mois à l'hôpital de Berk-sur-mer. Ce réseau est particulier dans le sens où il est constitué de trois dimensions: temporelle, structurelle par la répartition des personnes en services et fonctionnelle car chaque personne appartient à une catégorie socio-professionnelle. Pour chacune des dimensions, nous avons utilisé des outils existants en physique statistique ainsi qu'en théorie des graphes pour extraire des informations permettant de décrire certaines propriétés du réseau. Cela nous a permis de souligner le caractère très structuré de la répartition des contacts qui suit la répartition en services et mis en évidence les accointances entre certaines catégories professionnelles. Concernant la partie temporelle, nous avons mis en avant l'évolution périodique circadienne et hebdomadaire ainsi que les différences fondamentales entre l'évolution des interactions des patients et celle des personnels. Nous avons aussi présenté des outils permettant de comparer l'activité entre deux périodes données et de quantifier la similarité de ces périodes. Nous avons ensuite utilisé la technique de simulation pour extraire des propriétés de diffusion de ce réseau afin de donner quelques indices pour établir une politique de prévention
In this thesis, we focus on tools whose aim is to extract structural and temporal properties of dynamic networks as well as diffusion characteristics which can occur on these networks. We work on specific data, from the European MOSAR project, including the network of individuals proximity from time to time during 6 months at the Brek-sur-Mer Hospital. The studied network is notable because of its three dimensions constitution : the structural one induced by the distribution of individuals into distinct services, the functional dimension due to the partition of individual into groups of socio-professional categories and the temporal dimension.For each dimension, we used tools well known from the areas of statistical physics as well as graphs theory in order to extract information which enable to describe the network properties. These methods underline the specific structure of the contacts distribution which follows the individuals distribution into services. We also highlight strong links within specific socio-professional categories. Regarding the temporal part, we extract circadian and weekly patterns and quantify the similarities of these activities. We also notice distinct behaviour within patients and staff evolution. In addition, we present tools to compare the network activity within two given periods. To finish, we use simulations techniques to extract diffusion properties of the network to find some clues in order to establish a prevention policy

APA, Harvard, Vancouver, ISO, and other styles

36

Lumbreras, Alberto. "Automatic role detection in online forums." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE2111/document.

Full text

Abstract:

Nous traitons dans cette thèse le problème de la détection des rôles des utilisateurs sur des forums de discussion en ligne. On peut détenir un rôle comme l'ensemble des comportements propres d'une personne ou d'une position. Sur les forums de discussion, les comportements sont surtout observés à travers des conversations. Pour autant, nous centrons notre attention sur la manière dont les utilisateurs dialoguent. Nous proposons trois méthodes pour détecter des groupes d'utilisateurs où les utilisateurs d'un même groupe dialoguent de façon similaire.Notre première méthode se base sur les structures des conversations dans lesquelles les utilisateurs participent. Nous appliquons des notions de voisinage différentes (radiusbased, order-based, and time-based) applicables aux commentaires qui sont représentés par des noeuds sur un arbre. Nous comparons les motifs de conversation qu'ils permettent de détecter ainsi que les groupes d'utilisateurs associés à des motifs similaires. Notre deuxième méthode se base sur des modèles stochastiques de croissance appliqués aux fils de discussion. Nous proposons une méthode pour trouver des groupes d'utilisateurs qui ont tendance à répondre au même type de commentaire. Nous montrons que, bien qu'il y ait des groupes d'utilisateurs avec des motifs de réponse similaires, il n'y a pas d'évidence forte qui confirme que ces comportements présentent des propriétés prédictives quant aux comportements futurs {sauf pour quelques groupes avec des comportements extrêmes. Avec notre troisième méthode nous intégrons les types de données utilisés dans les deux méthodes précédentes (feature-based et behavioral ou functional-based) et nous montrons que le modèle trouve des groupes en ayant besoin de moins d'observations. L'hypothèse du modèle est que les utilisateurs qui ont des caractéristiques similaires ont aussi des comportements similaires
This thesis addresses the problem of detecting user roles in online discussion forums. A role may be defined as the set of behaviors characteristic of a person or a position. In discussion forums, behaviors are primarily observed through conversations. Hence, we focus our attention on how users discuss. We propose three methods to detect groups of users with similar conversational behaviors.Our first method for the detection of roles is based on conversational structures. Weapply different notions of neighborhood for posts in tree graphs (radius-based, order-based, and time-based) and compare the conversational patterns that they detect as well as the clusters of users with similar conversational patterns.Our second method is based on stochastic models of growth for conversation threads.Building upon these models we propose a method to find groups of users that tend to reply to the same type of posts. We show that, while there are clusters of users with similar replying patterns, there is no strong evidence that these behaviors are predictive of future behaviors |except for some groups of users with extreme behaviors.In out last method, we integrate the type of data used in the two previous methods(feature-based and behavioral or functional-based) and show that we can find clusters using fewer examples. The model exploits the idea that users with similar features have similar behaviors

APA, Harvard, Vancouver, ISO, and other styles

37

Zaylaa, Amira. "Analyse et extraction de paramètres de complexité de signaux biomédicaux." Thesis, Tours, 2014. http://www.theses.fr/2014TOUR3315/document.

Full text

Abstract:

L'analyse de séries temporelles biomédicales chaotiques tirées de systèmes dynamiques non-linéaires est toujours un challenge difficile à relever puisque dans certains cas bien spécifiques les techniques existantes basées sur les multi-fractales, les entropies et les graphes de récurrence échouent. Pour contourner les limitations des invariants précédents, de nouveaux descripteurs peuvent être proposés. Dans ce travail de recherche nos contributions ont porté à la fois sur l’amélioration d’indicateurs multifractaux (basés sur une fonction de structure) et entropiques (approchées) mais aussi sur des indicateurs de récurrences (non biaisés). Ces différents indicateurs ont été développés avec pour objectif majeur d’améliorer la discrimination entre des signaux de complexité différente ou d’améliorer la détection de transitions ou de changements de régime du système étudié. Ces changements agissant directement sur l’irrégularité du signal, des mouvements browniens fractionnaires et des signaux tirés du système du Lorenz ont été testés. Ces nouveaux descripteurs ont aussi été validés pour discriminer des fœtus en souffrance de fœtus sains durant le troisième trimestre de grossesse. Des mesures statistiques telles que l’erreur relative, l’écart type, la spécificité, la sensibilité ou la précision ont été utilisées pour évaluer les performances de la détection ou de la classification. Le fort potentiel de ces nouveaux invariants nous laisse penser qu’ils pourraient constituer une forte valeur ajoutée dans l’aide au diagnostic s’ils étaient implémentés dans des logiciels de post-traitement ou dans des dispositifs biomédicaux. Enfin, bien que ces différentes méthodes aient été validées exclusivement sur des signaux fœtaux, une future étude incluant des signaux tirés d’autres systèmes dynamiques nonlinéaires sera réalisée pour confirmer leurs bonnes performances
The analysis of biomedical time series derived from nonlinear dynamic systems is challenging due to the chaotic nature of these time series. Only few classical parameters can be detected by clinicians to opt the state of patients and fetuses. Though there exist valuable complexity invariants such as multi-fractal parameters, entropies and recurrence plot, they were unsatisfactory in certain cases. To overcome this limitation, we propose in this dissertation new entropy invariants, we contributed to multi-fractal analysis and we developed signal-based (unbiased) recurrence plots based on the dynamic transitions of time series. Principally, we aim to improve the discrimination between healthy and distressed biomedical systems, particularly fetuses by processing the time series using our techniques. These techniques were either validated on Lorenz system, logistic maps or fractional Brownian motions modeling chaotic and random time series. Then the techniques were applied to real fetus heart rate signals recorded in the third trimester of pregnancy. Statistical measures comprising the relative errors, standard deviation, sensitivity, specificity, precision or accuracy were employed to evaluate the performance of detection. Elevated discernment outcomes were realized by the high-order entropy invariants. Multi-fractal analysis using a structure function enhances the detection of medical fetal states. Unbiased cross-determinism invariant amended the discrimination process. The significance of our techniques lies behind their post-processing codes which could build up cutting-edge portable machines offering advanced discrimination and detection of Intrauterine Growth Restriction prior to fetal death. This work was devoted to Fetal Heart Rates but time series generated by alternative nonlinear dynamic systems should be further considered

APA, Harvard, Vancouver, ISO, and other styles

38

Bělohlávek, Jiří. "Agent pro kurzové sázení." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235980.

Full text

Abstract:

This master thesis deals with design and implementation of betting agent. It covers issues such as theoretical background of an online betting, probability and statistics. In its first part it is focused on data mining and explains the principle of knowledge mining form data warehouses and certain methods suitable for different types of tasks. Second, it is concerned with neural networks and algorithm of back-propagation. All the findings are demonstrated on and supported by graphs and histograms of data analysis, made via SAS Enterprise Miner program. In conclusion, the thesis summarizes all the results and offers specific methods of extension of the agent.

APA, Harvard, Vancouver, ISO, and other styles

39

SORIANI, NICOLA. "Topics in Statistical Models for Network Analysis." Doctoral thesis, 2012. http://hdl.handle.net/11577/2697077.

Full text

Abstract:

La Network Analysis è un insieme di tecniche statistiche e matematiche per lo studio di dati relazionali per un sistema di entità interconnesse. Molti dei risultati per i dati di rete provengono dalla Social Network Analysis (SNA), incentrata principalmente sullo studio delle relazioni tra un insieme di individui e organizzazioni. La tesi tratta alcuni argomenti riguardanti la modellazione statistica per dati di rete, con particolare attenzione ai modelli utilizzati in SNA. Il nucleo centrale della tesi è rappresentato dai Capitoli 3, 4 e 5. Nel Capitolo 3, viene proposto un approccio alternativo per la stima dei modelli esponenziali per grafi casuali (Exponential Random Graph Models - ERGMs). Nel capitolo 4, l'approccio di modellazione ERGM e quello a Spazio Latente vengono confrontati in termini di bontà di adattamento. Nel capitolo 5, vengono proposti metodi alternativi per la stima della classe di modelli p2.

APA, Harvard, Vancouver, ISO, and other styles

40

Jalali, Ali 1982. "Dirty statistical models." Thesis, 2012. http://hdl.handle.net/2152/ETD-UT-2012-05-5088.

Full text

Abstract:

In fields across science and engineering, we are increasingly faced with problems where the number of variables or features we need to estimate is much larger than the number of observations. Under such high-dimensional scaling, for any hope of statistically consistent estimation, it becomes vital to leverage any potential structure in the problem such as sparsity, low-rank structure or block sparsity. However, data may deviate significantly from any one such statistical model. The motivation of this thesis is: can we simultaneously leverage more than one such statistical structural model, to obtain consistency in a larger number of problems, and with fewer samples, than can be obtained by single models? Our approach involves combining via simple linear superposition, a technique we term dirty models. The idea is very simple: while any one structure might not capture the data, a superposition of structural classes might. Dirty models thus searches for a parameter that can be decomposed into a number of simpler structures such as (a) sparse plus block-sparse, (b) sparse plus low-rank and (c) low-rank plus block-sparse. In this thesis, we propose dirty model based algorithms for different problems such as multi-task learning, graph clustering and time-series analysis with latent factors. We analyze these algorithms in terms of the number of observations we need to estimate the variables. These algorithms are based on convex optimization and sometimes they are relatively slow. We provide a class of low-complexity greedy algorithms that not only can solve these optimizations faster, but also guarantee the solution. Other than theoretical results, in each case, we provide experimental results to illustrate the power of dirty models.
text

APA, Harvard, Vancouver, ISO, and other styles

41

"Generalized Statistical Tolerance Analysis and Three Dimensional Model for Manufacturing Tolerance Transfer in Manufacturing Process Planning." Doctoral diss., 2011. http://hdl.handle.net/2286/R.I.9125.

Full text

Abstract:

abstract: Mostly, manufacturing tolerance charts are used these days for manufacturing tolerance transfer but these have the limitation of being one dimensional only. Some research has been undertaken for the three dimensional geometric tolerances but it is too theoretical and yet to be ready for operator level usage. In this research, a new three dimensional model for tolerance transfer in manufacturing process planning is presented that is user friendly in the sense that it is built upon the Coordinate Measuring Machine (CMM) readings that are readily available in any decent manufacturing facility. This model can take care of datum reference change between non orthogonal datums (squeezed datums), non-linearly oriented datums (twisted datums) etc. Graph theoretic approach based upon ACIS, C++ and MFC is laid out to facilitate its implementation for automation of the model. A totally new approach to determining dimensions and tolerances for the manufacturing process plan is also presented. Secondly, a new statistical model for the statistical tolerance analysis based upon joint probability distribution of the trivariate normal distributed variables is presented. 4-D probability Maps have been developed in which the probability value of a point in space is represented by the size of the marker and the associated color. Points inside the part map represent the pass percentage for parts manufactured. The effect of refinement with form and orientation tolerance is highlighted by calculating the change in pass percentage with the pass percentage for size tolerance only. Delaunay triangulation and ray tracing algorithms have been used to automate the process of identifying the points inside and outside the part map. Proof of concept software has been implemented to demonstrate this model and to determine pass percentages for various cases. The model is further extended to assemblies by employing convolution algorithms on two trivariate statistical distributions to arrive at the statistical distribution of the assembly. Map generated by using Minkowski Sum techniques on the individual part maps is superimposed on the probability point cloud resulting from convolution. Delaunay triangulation and ray tracing algorithms are employed to determine the assembleability percentages for the assembly.
Dissertation/Thesis
Ph.D. Mechanical Engineering 2011

APA, Harvard, Vancouver, ISO, and other styles

42

Krištof, Radim. "Žákovská interpretace grafických výstupů statistických šetření." Master's thesis, 2016. http://www.nusl.cz/ntk/nusl-354104.

Full text

Abstract:

This thesis discusses the charts and their teaching. I study in it how graphs are introduced at Czech schools, ie how graphical outputs of statistical survey (graph types, their relevance, their description, etc.) are presented in textbooks. Then I analyze examples from international research TIMSS and PISA engaged in graphs depending on the success of results of Czech pupils. It turned out that pupils have no problem with reading values from graph, while their creation or solving nonstandard given exercises makes large difficulties to pupils. Last but not least I test the ability of students correctly but also critically interpret graphs, ie whether pupils can consider if graphs present actual data or are deliberately distorted and modified. For this purpose I create questionnaire where I test through three exercises pupils of graduation classes from grammar school and vocational school and pupils of study with vocational certificate. Results of graduates from both schools were comparable. Grammar school pupils succeeded especially in solving complicated or complex tasks, vocational school pupils got better results at solving tasks that require only orientation in graph and reading values. Pupils of vocational programs reached approximately half worse results. Powered by TCPDF (www.tcpdf.org)

APA, Harvard, Vancouver, ISO, and other styles

43

Che, Xuan. "Spatial graphical models with discrete and continuous components." Thesis, 2012. http://hdl.handle.net/1957/33644.

Full text

Abstract:

Graphical models use Markov properties to establish associations among dependent variables. To estimate spatial correlation and other parameters in graphical models, the conditional independences and joint probability distribution of the graph need to be specified. We can rely on Gaussian multivariate models to derive the joint distribution when all the nodes of the graph are assumed to be normally distributed. However, when some of the nodes are discrete, the Gaussian model no longer affords an appropriate joint distribution function. We develop methods specifying the joint distribution of a chain graph with both discrete and continuous components, with spatial dependencies assumed among all variables on the graph. We propose a new group of chain graphs known as the generalized tree networks. Constructing the chain graph as a generalized tree network, we partition its joint distributions according to the maximal cliques. Copula models help us to model correlation among discrete variables in the cliques. We examine the method by analyzing datasets with simulated Gaussian and Bernoulli Markov random fields, as well as with a real dataset involving household income and election results. Estimates from the graphical models are compared with those from spatial random effects models and multivariate regression models.
Graduation date: 2013

APA, Harvard, Vancouver, ISO, and other styles

44

"An application of cox hazard model and CART model in analyzing the mortality data of elderly in Hong Kong." 2002. http://library.cuhk.edu.hk/record=b5891190.

Full text

Abstract:

Pang Suet-Yee.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.
Includes bibliographical references (leaves 85-87).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Overview --- p.1
Chapter 1.1.1 --- Survival Analysis --- p.2
Chapter 1.1.2 --- Tree、-structured Statistical Method --- p.2
Chapter 1.1.3 --- Mortality Study --- p.3
Chapter 1.2 --- Motivation --- p.3
Chapter 1.3 --- Background Information --- p.4
Chapter 1.4 --- Data Content --- p.7
Chapter 1.5 --- Thesis Outline --- p.8
Chapter 2 --- Imputation and File Splitting --- p.10
Chapter 2.1 --- Imputation of Missing Values --- p.10
Chapter 2.1.1 --- Purpose of Imputation --- p.10
Chapter 2.1.2 --- Procedure of Hot Deck Imputation --- p.11
Chapter 2.1.3 --- List of Variables for Imputation --- p.12
Chapter 2.2 --- File Splitting --- p.14
Chapter 2.2.1 --- Splitting by Gender --- p.14
Chapter 2.3 --- Splitting for Validation Check --- p.1G
Chapter 3 --- Cox Hazard Model --- p.17
Chapter 3.1 --- Basic Idea --- p.17
Chapter 3.1.1 --- Survival Analysis --- p.17
Chapter 3.1.2 --- Survivor Function --- p.18
Chapter 3.1.3 --- Hazard Function --- p.18
Chapter 3.2 --- The Cox Proportional Hazards Model --- p.19
Chapter 3.2.1 --- Kaplan-Meier Estimate and Log-Rank Test --- p.20
Chapter 3.2.2 --- Hazard Ratio --- p.23
Chapter 3.2.3 --- Partial Likelihood --- p.24
Chapter 3.3 --- Extension of the Cox Proportional Hazards Model for Time-dependent Variables --- p.25
Chapter 3.3.1 --- Modification of the Cox's Model --- p.25
Chapter 3.4 --- Results of Model Fitting --- p.26
Chapter 3.4.1 --- Extract the Significant Covariates from the Models --- p.31
Chapter 3.5 --- Model Interpretation --- p.32
Chapter 4 --- CART --- p.37
Chapter 4.1 --- CART Procedure --- p.38
Chapter 4.2 --- Selection of the Splits --- p.39
Chapter 4.2.1 --- Goodness of Split --- p.39
Chapter 4.2.2 --- Type of Variables --- p.40
Chapter 4.2.3 --- Estimation --- p.40
Chapter 4.3 --- Pruning the Tree --- p.41
Chapter 4.3.1 --- Misclassification Cost --- p.42
Chapter 4.3.2 --- Class Assignment Rule --- p.44
Chapter 4.3.3 --- Minimal Cost Complexity Pruning --- p.44
Chapter 4.4 --- Cross Validation --- p.47
Chapter 4.4.1 --- V-fold Cross-validation --- p.47
Chapter 4.4.2 --- Selecting the right sized tree --- p.49
Chapter 4.5 --- Missing Value --- p.49
Chapter 4.6 --- Results of CART program --- p.51
Chapter 4.7 --- Model Interpretation --- p.53
Chapter 5 --- Model Prediction --- p.58
Chapter 5.1 --- Application to Test Sample --- p.58
Chapter 5.1.1 --- Fitting test sample to Cox's Model --- p.59
Chapter 5.1.2 --- Fitting test sample to CART model --- p.61
Chapter 5.2 --- Comparison of Model Prediction --- p.62
Chapter 5.2.1 --- Misclassification Rate --- p.62
Chapter 5.2.2 --- Misclassification Rate of Cox's model --- p.63
Chapter 5.2.3 --- Misclassification Rate of CART model --- p.64
Chapter 5.2.4 --- Prediction Result --- p.64
Chapter 6 --- Conclusion --- p.67
Chapter 6.1 --- Comparison of Results --- p.67
Chapter 6.2 --- Comparison of the Two Statistical Techniques --- p.68
Chapter 6.3 --- Limitation --- p.70
Appendix A: Coding Description for the Health Factors --- p.72
Appendix B: Log-rank Test --- p.75
Appendix C: Longitudinal Plot of Time Dependent Variables --- p.76
Appendix D: Hypothesis Testing of Suspected Covariates --- p.78
Appendix E: Terminal node report for both gender --- p.81
Appendix F: Calculation of Critical Values --- p.83
Appendix G: Distribution of Missing Value in Learning sample and Test Sample --- p.84
Bibliography --- p.85

APA, Harvard, Vancouver, ISO, and other styles

45

KANG, TSAI FU, and 蔡阜鋼. "The study on item analysis – a research on elementary school students’ conceptualization of statistical gragh." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/66413671381554190760.

Full text

Abstract:

碩士
臺中師範學院
數學教育學系在職進修教學碩士學位班
93
This study is to devise a test item on statistical gragh by researcher, and to analyze the test outcome through the relational structure figure drawn by Item Relational Structure Analysis (IRS), with the expectation to explore the cognition constructing process of elementary school students when they are forming their concepts of statistical gragh. The researcher, taking references to the present math curriculum of elementary school and the domestic and foreign studies, devises a test item based on seven sub-concepts of statistical gragh: (1) basic statistical concepts of data, (2) concepts of symbolic apposition, (3) concepts of graphing a data set, (4) concepts of statistical gragh presentation, (5) concepts of comparison among subclasses on statistical gragh, (6) statistical concepts of statistical gragh of single data set, and (7) statistical concepts of statistical gragh of multiple data set. The study was done on a class of sixth graders of an elementary school in Tai-Chung County. After the students took the test, the outcome was analyzed with IRSP, which is designed based on IRS with the expectation of getting information through the item relational structure analysis figure. According to the structure figure, several findings are concluded: With the paper-and-pencil test and analysis on relational structure figure, the researcher has found the cognition of the tested elementary school students on statistical gragh developing in the following order: (1) basic statistical concepts of data, (2) concepts of symbolic apposition, (3) concepts of graphing a data set, (4) concepts of statistical gragh presentation, (5) concepts of comparison among subclasses on statistical gragh, (6) statistical concepts of statistical gragh of single data set, and (7) statistical concepts of statistical gragh of multiple data set. Through this test, it is also found in this study that the outcome of Bar Chart is better than Line Chart and the outcome of Line Chart is better than Pie Chart. However, sub-concepts of Pie Chart do not have significant hierarchically relation with the sub-concepts of Bar Chart and Line Chart though they do show influence on the students’ cognition development on conceptualizing statistical gragh. With the findings and the conclusion, the researcher has made some suggestions that teachers and future studies can draw references to.

APA, Harvard, Vancouver, ISO, and other styles

46

Gupta, Shubham. "Statistical Network Analysis: Community Structure, Fairness Constraints, and Emergent Behavior." Thesis, 2021. https://etd.iisc.ac.in/handle/2005/5513.

Full text

Abstract:

Networks or graphs provide mathematical tools for describing and analyzing relational data. They are used in biology to model interactions between proteins, in economics to identify trade alliances among countries, in epidemiology to study the spread of diseases, and in computer science to rank webpages on a search engine, to name a few. Each application domain in this wide assortment encounters networks with diverse properties and imposes various constraints. For example, networks may be dynamic, heterogeneous, or attributed, and an application domain may require a fairness constraint on the communities (e.g. requiring communities in a social network to be balanced with respect to genders). However, most existing research is concerned with the simplest type of networks with a fixed set of nodes and edges and focuses on the canonical forms of tasks like community detection and link prediction. This thesis aims at bridging this gap between the simplistic problem settings considered in the literature and the complex requirements of real-world applications by proposing community detection and link prediction methods to analyze different types of networks from various perspectives. Our first contribution includes two spectral algorithms for finding `fair' communities in a given network $\calG$. We define what it means for communities to be fair from the perspective of each node (a.k.a. individual fairness). This is done via an auxiliary `representation graph' $\calR$ that connects nodes if they can represent each other's interests in various communities. Informally speaking, a node finds communities fair if its neighbors from $\calR$ are propotionally distributed across all communities in $\calG$. The goal is to find communities that are considered fair by all nodes. We show that this fairness criterion \textbf{(i)} generalizes the well-explored idea of statistical fairness and \textbf{(ii)} is also applicable in cases where sensitive node attributes (like gender and race) are not observable but instead manifest themselves as intrinsic or latent features in $\calR$. We develop fair spectral clustering algorithms and prove that they are weakly consistent ($\#\text{mistakes} = o(N)$ with probability $1 - o(1)$) under a proposed variant of the stochastic block model. Second, we propose a community-based statistical model for dynamic networks where edges appear and disappear over time. Many networks like social networks, citation networks, contact networks, etc., are dynamic in nature. Our model embeds the nodes and communities in a $d$-dimensional latent space and specifies a procedure for updating these embeddings over time to model the network's evolution. Given an observed dynamic network, we infer these latent quantities using variational inference and use them for link forecasting and community detection. Unlike existing approaches, our model supports the birth and death of communities. It also allows us to use powerful neural networks during inference. Experiments demonstrate that our model is better at link forecasting and community detection as compared to existing approaches. Moreover, it discovers \textit{stable} communities, as quantified by the normalized mutual information (NMI) score between communities discovered at successive time steps. This desirable quality is absent in methods that ignore the network dynamics. Third, we propose a statistical model for heterogeneous dynamic networks where the nodes and relations additionally have a \textit{type} associated with them (e.g., knowledge graphs). Besides the latent node attributes, this model also encodes a set of \textit{interaction matrices} for each type of relation. These matrices specify the affinity between nodes based on their attribute values and can represent both homophyllic (like attracts like) and heterophyllic relationships (opposites attract). We develop a scalable neural network-based inference procedure for this model and demonstrate that it outperforms existing state-of-the-art approaches on several homogeneous and heterogeneous dynamic network datasets, particularly the temporal knowledge graphs. Fourth, we develop a model for networks with node covariates to bring explainability to community detection. This model integrates node covariates into a stochastic block model using restricted Boltzmann machines. We subscribe to the view that a community can be explained by identifying the defining covariates of its member nodes. Our model provides the relative importance of various covariates in each community, thereby explaining its decision to group the members. Existing approaches for modeling networks with covariates lack this property, especially the ones that are based on deep neural networks. We also derive an efficient inference procedure that runs in linear time in the number of nodes and edges. Experiments confirm that our model's community detection performance is comparable with recent deep neural network-based approaches. However, it additionally offers the advantage of explainability. The discussion till this point views communities as passive structures arising out of interactions between nodes. However, just like existing links in a network determine future links, communities also play a functional role in shaping the behavior of the nodes (for example, preference for a clothing brand). Our final contribution explores this functional view of communities and shows that they affect emergent communication in a networked multi-agent reinforcement learning setting.

APA, Harvard, Vancouver, ISO, and other styles

47

Humphries, Peter J. "Combinatorial aspects of leaf-labelled trees : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mathematics, University of Canterbury Department of Mathematics and Statistics /." 2008. http://hdl.handle.net/10092/1801.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Wu, Su-Heng, and 吳素亨. "A Content Analysis of the Statistical Graphs Materials in the Elementary Mathematic Textbooks of Taiwan and Finland." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/ps96p5.

Full text

Abstract:

碩士
國立臺北教育大學
數學暨資訊教育學系(含數學教育碩士班)
104
The study aims to explore the differences between the mathematics textbook of Hanlin version in Taiwan, and WSOY version in Finland in terms of the time and order of statistical charts, and how the teaching materials are presented. Through content analysis method and concept mapping, the study discovers that Hanlin versions in Taiwan and WSOY version in Finland all contain higher proportion of “reading, comparing, and interpreting statistical charts” and “reading application forms,” showing that both countries emphasize the reading, comparing, and understanding of statistical charts and forms. The teaching materials of both countries include materials for first graders to classify and arrange data and turn records into numbers. As for the design of questions, the questions in the teaching materials of both countries are mostly comparison and calculation. Yet for the arrangement of topics on statistical forms, the teaching materials of the 2D forms in Hanlin version is to seek answers through cross corresponding whereas those in WSOY version contains 2D forms in both cross corresponding and logical judgment. In statistical charts, Hanlin version contains bar charts, line charts, and pie charts (Exception: the workbook of the second semester of the fourth graders in the 2003 Hanlin version contains population pyramids) while the statistical charts in WSOY version are diverse, including not only bar charts, line charts, pie charts as well as graphic statistical charts, climate diagrams, population pyramids, and histograms. However, in the drawing of statistical charts, the teaching materials in Finland do not contain the drawing of pie charts, but those include the drawing of range bars. The study suggests the statistical charts textbooks in early grades shall highlight again the importance of establishing classification standards for the development of classification concepts rather than only emphasizing records. In this way, the first step for organizing data can be completed.

APA, Harvard, Vancouver, ISO, and other styles

49

Rea, William S. "The application of atheoretical regression trees to problems in time series analysis : a thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Mathematics, Department of Mathematics and Statistics, University of Canterbury /." 2008. http://hdl.handle.net/10092/1715.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Kulmatitskiy, Nikolay. "Modeling Dynamic Network with Centrality-based Logistic Regression." Thesis, 2011. http://hdl.handle.net/10012/6290.

Full text

Abstract:

Statistical analysis of network data is an active ﬁeld of study, in which researchers inves- tigate graph-theoretic concepts and various probability models that explain the behaviour of real networks. This thesis attempts to combine two of these concepts: an exponential random graph and a centrality index. Exponential random graphs comprise the most useful class of probability models for network data. These models often require the assumption of a complex dependence structure, which creates certain diﬃculties in the estimation of unknown model parameters. However, in the context of dynamic networks the exponential random graph model provides the opportunity to incorporate a complex network structure such as centrality without the usual drawbacks associated with parameter estimation. The thesis employs this idea by proposing probability models that are equivalent to the logistic regression models and that can be used to explain behaviour of both static and dynamic networks.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!