Dissertations / Theses: 'Clustering coefficient'

1

Parikh, Nidhi Kiranbhai. "Generating Random Graphs with Tunable Clustering Coefficient." Thesis, Virginia Tech, 2011. http://hdl.handle.net/10919/31591.

Full text

Abstract:

Most real-world networks exhibit a high clustering coefficientâ the probability that two neighbors of a node are also neighbors of each other. We propose four algorithms CONF-1, CONF-2, THROW-1, and THROW-2 which are based on the configuration model and that take triangle degree sequence (representing the number of triangles/corners at a node) and single-edge degree sequence (representing the number of single-edges/stubs at a node) as input and generate a random graph with a tunable clustering coefficient. We analyze them theoretically and empirically for the case of a regular graph. CONF-1 and CONF-2 generate a random graph with the degree sequence and the clustering coefficient anticipated from the input triangle and single-edge degree sequences. At each time step, CONF-1 chooses each node for creating triangles or single edges with the same probability, while CONF-2 chooses a node for creating triangles or single edge with a probability proportional to their number of unconnected corners or unconnected stubs, respectively. Experimental results match quite well with the anticipated clustering coefficient except for highly dense graphs, in which case the experimental clustering coefficient is higher than the anticipated value. THROW-2 chooses three distinct nodes for creating triangles and two distinct nodes for creating single edges, while they need not be distinct for THROW-1. For THROW-1 and THROW-2, the degree sequence and the clustering coefficient of the generated graph varies from the input. However, the expected degree distribution, and the clustering coefficient of the generated graph can also be predicted using analytical results. Experiments show that, for THROW-1 and THROW-2, the results match quite well with the analytical results. Typically, only information about degree sequence or degree distribution is available. We also propose an algorithm DEG that takes degree sequence and clustering coefficient as input and generates a graph with the same properties. Experiments show results for DEG that are quite similar to those for CONF-1 and CONF-2.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

2

Jäger, Simon [Verfasser]. "Exponential domination, exponential independence, and the clustering coefficient / Simon Jäger." Ulm : Universität Ulm, 2017. http://d-nb.info/114748449X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Heinrich, Irene [Verfasser]. "On Graph Decomposition: Hajós' Conjecture, the Clustering Coefficient and Dominating Sets / Irene Heinrich." München : Verlag Dr. Hut, 2020. http://d-nb.info/1219606197/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Oppong, Augustine. "Clustering Mixed Data: An Extension of the Gower Coefficient with Weighted L2 Distance." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/etd/3463.

Full text

Abstract:

Sorting out data into partitions is increasing becoming complex as the constituents of data is growing outward everyday. Mixed data comprises continuous, categorical, directional functional and other types of variables. Clustering mixed data is based on special dissimilarities of the variables. Some data types may influence the clustering solution. Assigning appropriate weight to the functional data may improve the performance of the clustering algorithm. In this paper we use the extension of the Gower coefficient with judciously chosen weight for the L2 to cluster mixed data.The benefits of weighting are demonstrated both in in applications to the Buoy data set as well simulation studies. Our studies show that clustering algorithms with application of proper weight give superior recovery level when a set of data with mixed continuous, categorical directional and functional attributes is clustered. We discuss open problems for future research in clustering mixed data.

APA, Harvard, Vancouver, ISO, and other styles

5

Nascimento, Mariá Cristina Vasconcelos. "Metaheurísticas para o problema de agrupamento de dados em grafo." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-17052010-155334/.

Full text

Abstract:

O problema de agrupamento de dados em grafos consiste em encontrar clusters de nós em um dado grafo, ou seja, encontrar subgrafos com alta conectividade. Esse problema pode receber outras nomenclaturas, algumas delas são: problema de particionamento de grafos e problema de detecção de comunidades. Para modelar esse problema, existem diversas formulações matemáticas, cada qual com suas vantagens e desvantagens. A maioria dessas formulações tem como desvantagem a necessidade da definição prévia do número de grupos que se deseja obter. Entretanto, esse tipo de informação não está contida em dados para agrupamento, ou seja, em dados não rotulados. Esse foi um dos motivos da popularização nas últimas décadas da medida conhecida como modularidade, que tem sido maximizada para encontrar partições em grafos. Essa formulação, além de não exigir a definição prévia do número de clusters, se destaca pela qualidade das partições que ela fornece. Nesta Tese, metaheurísticas Greedy Randomized Search Procedures para dois modelos existentes para agrupamento em grafos foram propostas: uma para o problema de maximização da modularidade e a outra para o problema de maximização da similaridade intra-cluster. Os resultados obtidos por essas metaheurísticas foram melhores quando comparadas àqueles de outras heurísticas encontradas na literatura. Entretanto, o custo computacional foi alto, principalmente o da metaheurística para o modelo de maximização da modularidade. Com o passar dos anos, estudos revelaram que a formulação que maximiza a modularidade das partições possui algumas limitações. A fim de promover uma alternativa à altura do modelo de maximização da modularidade, esta Tese propõe novas formulações matemáticas de agrupamento em grafos com e sem pesos que visam encontrar partições cujos clusters apresentem alta conectividade. Além disso, as formulações propostas são capazes de prover partições sem a necessidade de definição prévia do número de clusters. Testes com centenas de grafos com pesos comprovaram a eficiência dos modelos propostos. Comparando as partições provenientes de todos os modelos estudados nesta Tese, foram observados melhores resultados em uma das novas formulações propostas, que encontrou partições bastante satisfatórias, superiores às outras existentes, até mesmo para a de maximização de modularidade. Os resultados apresentaram alta correlação com a classificação real dos dados simulados e reais, sendo esses últimos, em sua maioria, de origem biológica
Graph clustering aims at identifying highly connected groups or clusters of nodes of a graph. This problem can assume others nomenclatures, such as: graph partitioning problem and community detection problem. There are many mathematical formulations to model this problem, each one with advantages and disadvantages. Most of these formulations have the disadvantage of requiring the definition of the number of clusters in the final partition. Nevertheless, this type of information is not found in graphs for clustering, i.e., whose data are unlabeled. This is one of the reasons for the popularization in the last decades of the measure known as modularity, which is being maximized to find graph partitions. This formulation does not require the definition of the number of clusters of the partitions to be produced, and produces high quality partitions. In this Thesis, Greedy Randomized Search Procedures metaheuristics for two existing graph clustering mathematical formulations are proposed: one for the maximization of the partition modularity and the other for the maximization of the intra-cluster similarity. The results obtained by these proposed metaheuristics outperformed the results from other heuristics found in the literature. However, their computational cost was high, mainly for the metaheuristic for the maximization of modularity model. Along the years, researches revealed that the formulation that maximizes the modularity of the partitions has some limitations. In order to promote a good alternative for the maximization of the partition modularity model, this Thesis proposed new mathematical formulations for graph clustering for weighted and unweighted graphs, aiming at finding partitions with high connectivity clusters. Furthermore, the proposed formulations are able to provide partitions without a previous definition of the true number of clusters. Computational tests with hundreds of weighted graphs confirmed the efficiency of the proposed models. Comparing the partitions from all studied formulations in this Thesis, it was possible to observe that the proposed formulations presented better results, even better than the maximization of partition modularity. These results are characterized by satisfactory partitions with high correlation with the true classification for the simulated and real data (mostly biological)

APA, Harvard, Vancouver, ISO, and other styles

6

Koomson, Obed. "Performance Assessment of The Extended Gower Coefficient on Mixed Data with Varying Types of Functional Data." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/etd/3512.

Full text

Abstract:

Clustering is a widely used technique in data mining applications to source, manage, analyze and extract vital information from large amounts of data. Most clustering procedures are limited in their performance when it comes to data with mixed attributes. In recent times, mixed data have evolved to include directional and functional data. In this study, we will give an introduction to clustering with an eye towards the application of the extended Gower coefficient by Hendrickson (2014). We will conduct a simulation study to assess the performance of this coefficient on mixed data whose functional component has strictly-decreasing signal curves and also those whose functional component has a mixture of strictly-decreasing signal curves and periodic tendencies. We will assess how four different hierarchical clustering algorithms perform on mixed data simulated under varying conditions with and without weights. The comparison of the various clustering solutions will be done using the Rand Index.

APA, Harvard, Vancouver, ISO, and other styles

7

Li, Han. "Statistical Modeling and Analysis of Bivariate Spatial-Temporal Data with the Application to Stream Temperature Study." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/70862.

Full text

Abstract:

Water temperature is a critical factor for the quality and biological condition of streams. Among various factors affecting stream water temperature, air temperature is one of the most important factors related to water temperature. To appropriately quantify the relationship between water and air temperatures over a large geographic region, it is important to accommodate the spatial and temporal information of the steam temperature. In this dissertation, I devote effort to several statistical modeling techniques for analyzing bivariate spatial-temporal data in a stream temperature study. In the first part, I focus our analysis on the individual stream. A time varying coefficient model (VCM) is used to study the relationship between air temperature and water temperature for each stream. The time varying coefficient model enables dynamic modeling of the relationship, and therefore can be used to enhance the understanding of water and air temperature relationships. The proposed model is applied to 10 streams in Maryland, West Virginia, Virginia, North Carolina and Georgia using daily maximum temperatures. The VCM approach increases the prediction accuracy by more than 50% compared to the simple linear regression model and the nonlinear logistic model. The VCM that describes the relationship between water and air temperatures for each stream is represented by slope and intercept curves from the fitted model. In the second part, I consider water and air temperatures for different streams that are spatial correlated. I focus on clustering multiple streams by using intercept and slope curves estimated from the VCM. Spatial information is incorporated to make clustering results geographically meaningful. I further propose a weighted distance as a dissimilarity measure for streams, which provides a flexible framework to interpret the clustering results under different weights. Real data analysis shows that streams in same cluster share similar geographic features such as solar radiation, percent forest and elevation. In the third part, I develop a spatial-temporal VCM (STVCM) to deal with missing data. The STVCM takes both spatial and temporal variation of water temperature into account. I develop a novel estimation method that emphasizes the time effect and treats the space effect as a varying coefficient for the time effect. A simulation study shows that the performance of the STVCM on missing data imputation is better than several existing methods such as the neural network and the Gaussian process. The STVCM is also applied to all 156 streams in this study to obtain a complete data record.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

8

Stephens, Skylar Nicholas. "Analytical and Computational Micromechanics Analysis of the Effects of Interphase Regions, Orientation, and Clustering on the Effective Coefficient of Thermal Expansion of Carbon Nanotube-Polymer Nanocomposites." Thesis, Virginia Tech, 2013. http://hdl.handle.net/10919/23216.

Full text

Abstract:

Analytic and computational micromechanics techniques based on the composite cylinders method and the finite element method, respectively, have been used to determine the effective coefficient of thermal expansion (CTE) of carbon nanotube-epoxy nanocomposites containing aligned nanotubes. Both techniques have been used in a parametric study of the influence of interphase stiffness and interphase CTE on the effective CTE of the nanocomposites. For both the axial and transverse CTE of aligned nanotube nanocomposites with and without interphase regions, the computational and analytic micromechanics techniques were shown to give similar results. The Mori-Tanka method has been used to account for the effect of randomly oriented fibers. Analytic and computational micromechanics techniques have also been used to assess the effects of clustering and clustering with interphase on the effective CTE components. Clustering is observed to have a minimal impact on the effective axial CTE of the nanocomposite and a 3-10%. However, there is a combined effect with clustering and one of the interphase layers.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

9

Dhanasetty, Abhishek. "Enumerating Approximate Maximal Cliques in a Distributed Framework." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1617104719399743.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lee, James H. "A pollination network of Cornus florida." VCU Scholars Compass, 2014. http://scholarscompass.vcu.edu/etd/3615.

Full text

Abstract:

From the agent-based, correlated random walk model presented, we observe the effects of varying the parameter values of maximum insect turning area, 𝛿max, density of trees, ω, maximum pollen carryover, 𝜅max, and probability of fertilization, P𝜅, on the distribution of pollen within a population of Cornus florida (flowering dogwood). We see that varying 𝛿max and 𝜅max changes the dispersal distance of pollen, which greatly affects many measures of connectivity. The clustering coefficient of fathers is maximized when 𝛿max is between 60° and 90°. Varying ω does not have a major effect on the clustering coefficient of fathers, but it does have a greater effect on other measures of genetic diversity. Lastly, we compare our simulations with randomly-placed trees with that of actual tree placement of C. florida at the VCU Rice Center, concluding that in order to truly understand how pollen is distributed within a specific ecosystem, specificity in describing tree locations is necessary.

APA, Harvard, Vancouver, ISO, and other styles

11

Stewart, Craig R. "An Evolutionary Analysis of the Internet Autonomous System Network." Kent State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=kent1276550359.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Brooks, Josh Daniel. "Nested (2,r)-regular graphs and their network properties." Digital Commons @ East Tennessee State University, 2012. https://dc.etsu.edu/etd/1471.

Full text

Abstract:

A graph G is a (t, r)-regular graph if every collection of t independent vertices is collectively adjacent to exactly r vertices. If a graph G is (2, r)-regular where p, s, and m are positive integers, and m ≥ 2, then when n is sufficiently large, then G is isomorphic to G = Ks+mKp, where 2(p-1)+s = r. A nested (2,r)-regular graph is constructed by replacing selected cliques with a (2,r)-regular graph and joining the vertices of the peripheral cliques. For example, in a nested 's' graph when n = s + mp, we obtain n = s1+m1p1+mp. The nested 's' graph is now of the form Gs = Ks1+m1Kp1+mKp. We examine the network properties such as the average path length, clustering coefficient, and the spectrum of these nested graphs.

APA, Harvard, Vancouver, ISO, and other styles

13

Er, Chiangkai. "Speech recognition by clustering wavelet and PLP coefficients." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/42742.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Ribeiro, Filho Napoleão Póvoa. "Melhorando o desempenho da técnica de clusterização hierárquica single linkage utilizando a metaheurística GRASP." Universidade Federal do Tocantins, 2016. http://hdl.handle.net/11612/974.

Full text

Abstract:

O problema de clusterização (agrupamento) consiste em, a partir de uma base de dados, agrupar os elementos de modo que os mais similares fiquem no mesmo cluster (grupo), e os elementos menos similares fiquem em clusters distintos. Há várias maneiras de se realizar esses agrupamentos. Uma das mais populares é a hierárquica, onde é criada uma hierarquia de relacionamentos entre os elementos. Há vários métodos de se analisar a similaridade entre elementos no problema de clusterização. O mais utilizado entre eles é o método single linkage, que agrupa os elementos que apresentarem menor distância entre si. Para se aplicar a técnica em questão, uma matriz de distâncias é a entrada utilizada. Esse processo de agrupamento gera ao final uma árvore invertida conhecida como dendrograma. O coeficiente de correlação cofenética (ccc), obtido após a construção do dendrograma, é utilizado para avaliar a consistência dos agrupamentos gerados e indica o quão fiel o dendrograma está em relação aos dados originais. Dessa forma, um dendrograma apresenta agrupamentos mais consistentes quando o ccc for o mais próximo de um (1) . O problema de clusterização em todas as suas vertentes, inclusive a clusterização hierárquica (objeto de estudo nesse trabalho), pertence a classe de problemas NP-Completo. Assim sendo, é comum o uso de heurísticas para obter soluções de modo eficiente para esse problema. Com o objetivo de gerar dendrogramas que resultem em melhores ccc, é proposto no presente trabalho um novo algoritmo que utiliza os conceitos da metaheurística GRASP. Também é objetivo deste trabalho implementar tal solução em computação paralela em um cluster computacional, permitindo assim trabalhar com matrizes de dimensões maiores. Testes foram realizados para comprovar o desempenho do algoritmo proposto, comparando os resultados obtidos com os gerados pelo software R.
The problem of clustering (grouping) consists of, from a database, group the elements so that more queries are in the same cluster (group) and less similar elements are different clusters. There are several ways to accomplish these groupings. One of the most popular is the hierarchical, where a hierarchical relationships between the elements is created. There are several methods of analyzing the similarity between elements in the clustering problem. The most common among them is the single linkage method, which brings together the elements that are experiencing less apart. To apply the technique in question, distance matrix is the input used. This grouping process generates the end an inverted tree known as dendrogram. The cophenetic correlation coefficient (ccc), obtained after the construction of the dendrogram is a measure used to evaluate the consistency of the clusters generated and indicates how faithful he is in relation to the original data. Thus, a dendrogram gives more consistent clusters when the ccc is closer to one (1). The clustering problem in all its aspects, including hierarchical clustering (object of study in this work), belongs to the class of NP-complete problems. Therefore, it is common to use heuristics for efficient solutions to this problem. In order to generate dendrograms that result in better ccc, it is proposed in this paper a new algorithm that uses the concepts of GRASP metaheuristic. It is also objective of this work to implement such a solution in parallel computing in a computer cluster, thus working with arrays larger. Tests were conducted to confirm the performance of the proposed algorithm, comparing the results with those generated by the software R.

APA, Harvard, Vancouver, ISO, and other styles

15

Zhuang, Yuwen. "Metric Based Automatic Event Segmentation and Network Properties Of Experience Graphs." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1337372416.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Lin, Kaisheng. "Motif counts, clustering coefficients and vertex degrees in models of random networks." Thesis, University of Oxford, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.497038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Luo, Hongwei, and Hongwei luo@rmit edu au. "Modelling and simulation of large-scale complex networks." RMIT University. Mathematical and Geospatial Sciences, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080506.142224.

Full text

Abstract:

Real-world large-scale complex networks such as the Internet, social networks and biological networks have increasingly attracted the interest of researchers from many areas. Accurate modelling of the statistical regularities of these large-scale networks is critical to understand their global evolving structures and local dynamical patterns. Traditionally, the Erdos and Renyi random graph model has helped the investigation of various homogeneous networks. During the past decade, a special computational methodology has emerged to study complex networks, the outcome of which is identified by two models: the Watts and Strogatz small-world model and the Barabasi-Albert scale-free model. At the core of the complex network modelling process is the extraction of characteristics of real-world networks. I have developed computer simulation algorithms for study of the properties of current theoretical models as well as for the measurement of two real-world complex networks, which lead to the isolation of three complex network modelling essentials. The main contribution of the thesis is the introduction and study of a new General Two-Stage growth model (GTS Model), which aims to describe and analyze many common-featured real-world complex networks. The tools we use to create the model and later perform many measurements on it consist of computer simulations, numerical analysis and mathematical derivations. In particular, two major cases of this GTS model have been studied. One is named the U-P model, which employs a new functional form of the network growth rule: a linear combination of preferential attachment and uniform attachment. The degree distribution of the model is first studied by computer simulation, while the exact solution is also obtained analytically. Two other important properties of complex networks: the characteristic path length and the clustering coefficient are also extensively investigated, obtaining either analytically derived solutions or numerical results by computer simulations. Furthermore, I demonstrate that the hub-hub interaction behaves in effect as the link between a network's topology and resilience property. The other is called the Hybrid model, which incorporates two stages of growth and studies the transition behaviour between the Erdos and Renyi random graph model and the Barabasi-Albert scale-free model. The Hybrid model is measured by extensive numerical simulations focusing on its degree distribution, characteristic path length and clustering coefficient. Although either of the two cases serves as a new approach to modelling real-world large-scale complex networks, perhaps more importantly, the general two-stage model provides a new theoretical framework for complex network modelling, which can be extended in many ways besides the two studied in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

18

Montilla, Michaela. "Vliv parcelačního atlasu na kvalitu klasifikace pacientů s neurodegenerativním onemocněním." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-378150.

Full text

Abstract:

The aim of the thesis is to define the dependency of the classification of patients affected by neurodegenerative diseases on the choice of the parcellation atlas. Part of this thesis is the application of the functional connectivity analysis and the calculation of graph metrics according to the method published by Olaf Sporns and Mikail Rubinov [1] on fMRI data measured at CEITEC MU. The application is preceded by the theoretical research of parcellation atlases for brain segmentation from fMRI frames and the research of mathematical methods for classification as well as classifiers of neurodegenerative diseases. The first chapters of the thesis brings a theoretical basis of knowledge from the field of magnetic and functional magnetic resonance imaging. The physical principles of the method, the conditions and the course of acquisition of image data are defined. The third chapter summarizes the graph metrics used in the diploma thesis for analyzing and classifying graphs. The paper presents a brief overview of the brain segmentation methods, with the focuse on the atlas-based segmentation. After a theoretical research of functional connectivity methods and mathematical classification methods, the findings were used for segmentation, calculation of graph metrics and for classification of fMRI images obtained from 96 subjects into the one of two classes using Binary classifications by support vector machines and linear discriminatory analysis. The data classified in this study was measured on patiens with Parkinson’s disease (PD), Alzheimer’s disease (AD), Mild cognitive impairment (MCI), a combination of PD and MCI and subjects belonging to the control group of healthy individuals. For pre-processing and analysis, the MATLAB environment, the SPM12 toolbox and The Brain Connectivity Toolbox were used.

APA, Harvard, Vancouver, ISO, and other styles

19

Martinho, Maria. "Spatial analysis of exposure coefficients with applications to stomach cancer." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:427fe13e-39b1-4bfd-a3a8-be957120cf44.

Full text

Abstract:

Earlier ecological studies on the relation between H. pylori infection and stomach cancer have considered that the relation between these two variables, as estimated by the exposure coefficient, is constant. However, there is evidence to suggest that this relation changes geographically due to differences in strains of H. pylori. Since the prevalence of H. pylori varies with socio-economic status, the association between the latter and stomach cancer mortality may also vary geographically. This thesis studies stomach cancer by taking into account the geographical variability of the exposure coefficients. The study proposes the use of regression mixtures, clustering models and spatially varying regressions for the study of varying exposure coefficients. The effect of transformations of variables in these models appears to have been little considered. We provide new necessary conditions for invariance under transformations of variables for mixed effect models in general, and for the proposed models in particular. In addition, we show that varying exposure coefficients may induce a varying baseline risk. The regression mixtures and the clustering model are applied to a data set on stomach cancer incidence and H. pylori prevalence in 57 countries worldwide. We extend the clustering model to reflect any distance measure between the geographical units, including the Euclidean distance, in the formation of clusters. We also show that the clustering model performs better than the regression mixture model when the aim is to identify connected clusters and the observations present large variance. The results obtained with the clustering model supported the existence of three clusters where the interaction between the human and H. pylori populations have similar characteristics. Spatially varying regressions are applied to a data set of areal death counts of stomach cancer and spending power in 275 counties in continental Portugal. We provide an original strategy for implementing multivectorial intrinsic autoregressions as the distribution for the random effects. The results obtained with the application of this methodology were consistent with a varying exposure coefficient of spending power.

APA, Harvard, Vancouver, ISO, and other styles

20

COSTANTINI, GIULIO. "Network analysis: a new perspective on personality psychology." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2015. http://hdl.handle.net/10281/75269.

Full text

Abstract:

A new conception of personality based on network analysis has been recently proposed to overcome some of the limitations of the latent-variable theory of personality. While the latent-variable theory assumes that the covariation among the thoughts, feelings, and behaviors characterizing a personality domain can be explained by the effect of an unobservable latent variable (e.g., extraversion), the network approach conceives personality as emerging from direct interactions among these thoughts, feelings and behaviors. The network perspective motivates new ways of analyzing personality data and it can be especially important for investigating the mechanisms underlying personality. In this work, we present the basic network concepts and discuss several alternative ways to define networks from the data that are typically collected in personality psychology. The most important network indices, such as indices of centrality and of clustering coefficient, are described: we examine the properties of each index and explain why some of them, especially some indices of clustering coefficient, should not be applied to personality psychology data sets. Three new indices of clustering coefficient are proposed that are compatible with personality networks: their properties are tested both on simulated networks and networks based on actual personality psychology data. We present two applications of network analysis. The first application considers a network of 24 personality facets: we show how these facets relate to each other, and discuss both the local and the global properties of the network. The second application focuses on the dimension conscientiousness: We show that while some mechanisms underlying conscientiousness are common to many facets, other mechanisms may specifically characterize some facets and not others. By means of network analysis, we draw a comprehensive maps of conscientiousness that can serve as a guidance for future studies. The application of network analysis to the field of personality psychology is recent and its potentialities has not been fully explored yet: in the final part of this work, we discuss the limitations of our investigation and propose future developments of our research that can contribute to overcoming its limits.

APA, Harvard, Vancouver, ISO, and other styles

21

Shiping, Liu. "Synthetic notions of curvature and applications in graph theory." Doctoral thesis, Universitätsbibliothek Leipzig, 2013. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-102197.

Full text

Abstract:

The interaction between the study of geometric and analytic aspects of Riemannian manifolds and that of graphs is a very amazing subject. The study of synthetic curvature notions on graphs adds new contributions to this topic. In this thesis, we mainly study two kinds of synthetic curvature notions: the Ollivier-Ricci cuvature on locally finite graphs and the combinatorial curvature on infinite semiplanar graphs. In the first part, we study the Ollivier-Ricci curvature. As known in Riemannian geometry, a lower Ricci curvature bound prevents geodesics from diverging too fast on average. We translate this Riemannian idea into a combinatorial setting using the Olliver-Ricci curvature notion. Note that on a graph, the analogue of geodesics starting in different directions, but eventually approaching each other again, would be a triangle. We derive lower and upper Ollivier-Ricci curvature bounds on graphs in terms of number of triangles, which is sharp for instance for complete graphs. We then describe the relation between Ollivier-Ricci curvature and the local clustering coefficient, which is an important concept in network analysis introduced by Watts-Strogatz. Furthermore, positive lower boundedness of Ollivier-Ricci curvature for neighboring vertices imply the existence of at least one triangle. It turns out that the existence of triangles can also improve Lin-Yau\'s curvature dimension inequality on graphs and then produce an implication from Ollivier-Ricci curvature lower boundedness to the curvature dimension inequality. The existence of triangles prevents a graph from being bipartite. A finite graph is bipartite if and only if its largest eigenvalue equals 2. Therefore it is natural that Ollivier-Ricci curvature is closely related to the largest eigenvalue estimates. We combine Ollivier-Ricci curvature notion with the neighborhood graph method developed by Bauer-Jost to study the spectrum estimates of a finite graph. We can always obtain nontrivial estimates on a non-bipartite graph even if its curvature is nonpositive. This answers one of Ollivier\'s open problem in the finite graph setting. In the second part of this thesis, we study systematically infinite semiplanar graphs with nonnegative combinatorial curvature. Unlike the previous Gauss-Bonnet formula approach, we explore an Alexandrov approach based on the observation that the nonnegative combinatorial curvature on a semiplanar graph is equivalent to nonnegative Alexandrov curvature on the surface obtained by replacing each face by a regular polygon of side length one with the same facial degree and gluing the polygons along common edges. Applying Cheeger-Gromoll splitting theorem on the surface, we give a metric classification of infinite semiplanar graphs with nonnegative curvature. We also construct the graphs embedded into the projective plane minus one point. Those constructions answer a question proposed by Chen. We further prove the volume doubling property and Poincare inequality which make the running of Nash-Moser iteration possible. We in particular explore the volume growth behavior on Archimedean tilings on a plane and prove that they satisfy a weak version of relative volume comparison with constant 1. With the above two basic inequalities in hand, we study the geometric function theory of infinite semiplanar graphs with nonnegative curvature. We obtain the Liouville type theorem for positive harmonic functions, the parabolicity. We also prove a dimension estimate for polynomial growth harmonic functions, which is an extension of the solution of Colding-Minicozzi of a conjecture of Yau in Riemannian geometry.

APA, Harvard, Vancouver, ISO, and other styles

22

Jaskowiak, Pablo Andretta. "Estudo de coeficientes de correlação para medidas de proximidade em dados de expressão gênica." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05052011-143134/.

Full text

Abstract:

O desenvolvimento da tecnologia de microarray tornou possível a mediçao dos níveis de expressão de centenas ou até mesmo milhares de genes simultaneamente para diversas condições experimentais. A grande quantidade de dados disponível gerou a demanda por métodos computacionais que permitam sua análise de forma eficiente e automatizada. Em muitos dos métodos computacionais empregados durante a análise de dados de expressão gênica é necessária a escolha de uma medida de proximidade apropriada entre genes ou amostras. Dentre as medidas de proximidade disponíveis, coeficientes de correlação têm sido amplamente empregados, em virtude da sua capacidade em capturar similaridades entre tendências das sequências numéricas comparadas (genes ou amostras). O presente trabalho possui como objetivo comparar diferentes medidas de correlação para as três principais tarefas envolvidas na análise de dados de expressão gênica: agrupamento, seleção de atributos e classificação. Dessa forma, é apresentada nesta dissertação uma visão geral da análise de dados de expressão gênica e das diferentes medidas de correlação consideradas para tal comparação. São apresentados também resultados empíricos obtidos a partir da comparação dos coeficientes de correlação para agrupamento de genes, agrupamento de amostras, seleção de genes para o problema de classificação de amostras e classificação de amostras
The development of microarray technology made possible the expression level measurement of hundreds or even thousands of genes simultaneously for various experimental conditions. The huge amount of available data generated the need for computational methods that allow its analysis in an effcient and automated way. In many of the computational methods employed during gene expression data analysis the choice of a proximity measure is necessary. Among the proximity measures available, correlation coefficients have been widely employed because of their ability to capture similarity trends among the compared numeric sequences (genes or samples). The present work has as objective to compare different correlation measures for the three major tasks involved in the analysis of gene expression data: clustering, feature selection and classification. To this extent, in this dissertation an overview of gene expression data analysis and the different correlation measures considered for this comparison are presented. In the present work are also presented empirical results obtained from the comparison of correlation coefficients for gene clustering, sample clustering, gene selection for sample classification and sample classification

APA, Harvard, Vancouver, ISO, and other styles

23

Blini, Elvio A. "Biases in Visuo-Spatial Attention: from Assessment to Experimental Induction." Doctoral thesis, Università degli studi di Padova, 2016. http://hdl.handle.net/11577/3424480.

Full text

Abstract:

In this work I present several studies, which might appear rather heterogeneous for both experimental questions and methodological approaches, and yet are linked by a common leitmotiv: spatial attention. I will address issues related to the assessment of attentional asymmetries, in the healthy individual as in patients with neurological disorders, their role in various aspects of human cognition, and their neural underpinning, driven by the deep belief that spatial attention plays an important role in various mental processes that are not necessarily confined to perception. What follows is organized into two distinct sections. In the first I will focus on the evaluation of visuospatial asymmetries, starting from the description of a new paradigm particularly suitable for this purpose. In the first chapter I will describe the effects of multitasking in a spatial monitoring test; the main result shows a striking decreasing in detection performance as a function of the introduced memory load. In the second chapter I will apply the same paradigm to a clinical population characterized by a brain lesion affecting the left hemisphere. Despite a standard neuropsychological battery failed to highlight any lateralized attentional deficit, I will show that exploiting concurrent demands might lead to enhanced sensitivity of diagnostic tests and consequently positive effects on patients’ diagnostic and therapeutic management. Finally, in the third chapter I will suggest, in light of preliminary data, that attentional asymmetries also occur along the sagittal axis; I will argue, in particular, that more attentional resources appear to be allocated around peripersonal space, the resulting benefits extending to various tasks (i.e., discrimination tasks). Then, in the second section, I will follow a complementary approach: I will seek to induce attentional shifts in order to evaluate their role in different cognitive tasks. In the fourth and fifth chapters this will be pursued exploiting sensory stimulations: visual optokinetic stimulation and galvanic vestibular stimulation, respectively. In the fourth chapter I will show that spatial attention is highly involved in numerical cognition, this relationship being bidirectional. Specifically, I will show that optokinetic stimulation modulates the occurrence of procedural errors during mental arithmetics, and that calculation itself affects oculomotor behaviour in turn. In the fifth chapter I will examine the effects of galvanic vestibular stimulation, a particularly promising technique for the rehabilitation of lateralized attention disorders, on spatial representations. I will discuss critically a recent account for unilateral spatial neglect, suggesting that vestibular stimulations or disorders might indeed affect the metric representation of space, but not necessarily resulting in spatial unawareness. Finally, in the sixth chapter I will describe an attentional capture phenomenon by intrinsically rewarding distracters. I will seek, in particular, to predict the degree of attentional capture from resting-state functional magnetic resonance imaging data and the related brain connectivity pattern; I will report preliminary data focused on the importance of the cingulate-opercular network, and discuss the results through a parallel with clinical populations characterized by behavioural addictions.
In questo lavoro presenterò una serie di ricerche che possono sembrare piuttosto eterogenee per quesiti sperimentali e approcci metodologici, ma sono tuttavia legate da un filo conduttore comune: i costrutti di ragionamento e attenzione spaziale. Affronterò in particolare aspetti legati alla valutazione delle asimmetrie attenzionali, nell'individuo sano come nel paziente con disturbi neurologici, il loro ruolo in vari aspetti della cognizione umana, e i loro substrati neurali, guidato dalla convinzione che l’attenzione spaziale giochi un ruolo importante in svariati processi mentali non necessariamente limitati alla percezione. Quanto segue è stato dunque organizzato in due sezioni distinte. Nella prima mi soffermerò sulla valutazione delle asimmetrie visuospaziali, iniziando dalla descrizione di un nuovo paradigma particolarmente adatto a questo scopo. Nel primo capitolo descriverò gli effetti del doppio compito e del carico attenzionale su un test di monitoraggio spaziale; il risultato principale mostra un netto peggioramento nella prestazione al compito di detezione spaziale in funzione del carico di memoria introdotto. Nel secondo capitolo applicherò lo stesso paradigma ad una popolazione clinica contraddistinta da lesione cerebrale dell’emisfero sinistro. Nonostante una valutazione neuropsicologica standard non evidenziasse alcun deficit lateralizzato dell’attenzione, mostrerò che sfruttare un compito accessorio può portare ad una spiccata maggiore sensibilità dei test diagnostici, con evidenti ricadute benefiche sull'iter clinico e terapeutico dei pazienti. Infine, nel terzo capitolo suggerirò, tramite dati preliminari, che asimmetrie attenzionali possono essere individuate, nell'individuo sano, anche lungo l’asse sagittale; argomenterò, in particolare, che attorno allo spazio peripersonale sembrano essere generalmente concentrate più risorse attentive, e che i benefici conseguenti si estendono a compiti di varia natura (ad esempio compiti di discriminazione). Passerò dunque alla seconda sezione, in cui, seguendo una logica inversa, indurrò degli spostamenti nel focus attentivo in modo da valutarne il ruolo in compiti di varia natura. Nei capitoli quarto e quinto sfrutterò delle stimolazioni sensoriali: la stimolazione visiva optocinetica e la stimolazione galvanico vestibolare, rispettivamente. Nel quarto capitolo mostrerò che l’attenzione spaziale è coinvolta nella cognizione numerica, con cui intrattiene rapporti bidirezionali. Nello specifico mostrerò da un lato che la stimolazione optocinetica può modulare l’occorrenza di errori procedurali nel calcolo mentale, dall'altro che il calcolo stesso ha degli effetti sull'attenzione spaziale e in particolare sul comportamento oculomotorio. Nel quinto capitolo esaminerò gli effetti della stimolazione galvanica vestibolare, una tecnica particolarmente promettente per la riabilitazione dei disturbi attentivi lateralizzati, sulle rappresentazioni mentali dello spazio. Discuterò in modo critico un recente modello della negligenza spaziale unilaterale, suggerendo che stimolazioni e disturbi vestibolari possano sì avere ripercussioni sulle rappresentazioni metriche dello spazio, ma senza comportare necessariamente inattenzione per lo spazio stesso. Infine, nel sesto capitolo descriverò gli effetti di cattura dell’attenzione visuospaziale che stimoli distrattori intrinsecamente motivanti possono esercitare nell'adulto sano. Cercherò, in particolare, di predire l’entità di questa cattura attenzionale partendo da immagini di risonanza magnetica funzionale a riposo: riporterò dati preliminari focalizzati sull'importanza del circuito cingolo-opercolare, effettuando un parallelismo con popolazioni cliniche caratterizzate da comportamenti di dipendenza.

APA, Harvard, Vancouver, ISO, and other styles

24

Chou, Chin-Hou, and 周金侯. "The Relationship between Clustering Coefficient and Network Congestion." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/42881159196246437382.

Full text

Abstract:

碩士
淡江大學
資訊工程學系資訊網路與通訊碩士班
98
The Internet has been grows vigorously in recent years. The users and the data that will transmission by the network are increasing every year, but the network transmission technology and resource are not enough to deal with so much data. It triggers the network congestion phenomenon. Now, to reduce the network congestion phenomenon becomes an important topic. In this paper, we used the local link switching technology which will improve the network clustering coefficient and maintain the size of the network to adjust the network structure. Then we study the influence of the network congestion phenomenon by clustering coefficient. We observed that the network with higher clustering coefficient can execute the more transmission tasks at the same time. In the result, scale free network and small world network can increase the packet density by increasing the clustering coefficient, but random network doesn’t have this property. Additionally, the network model will influence the network congestion phenomenon, too. By our method that increases the network clustering coefficient, the random network and scale free network has the best result in the efficiency of transmission. The small world network decreased its own efficiency of transmission when we increased its clustering coefficient.

APA, Harvard, Vancouver, ISO, and other styles

25

Zhi-Hao, Zhong, and 鍾志豪. "An Analysis of Library Readers Using Clustering Coefficient." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/21454621440635594698.

Full text

Abstract:

碩士
東南科技大學
電機工程研究所
101
In recent years, the quantity of data explodes. Under this circumstance, automated library system records reader’s information in the database. A lot of information is hidden in the data. Data mining is to discover the data and to turn it into information which can be used. 　Data mining has been used in many areas such as marketing and customer relationship management (CRM). Based on the result of data mining, the owners can change merchandise display and understand customers’ life style and habits in order to increase the volume of sales. To use data mining in libraries allows us to relocate the books and to recommend book lists to readers. In the long term effect, the book circulation can be improved. 　Many internet services could provide different types of personalization and customization. The technologies allow the service providers to give more specific service by understanding customer behaviors. However, this technology has been neglected in the libraries. Therefore the thesis will focus on readers’ reading history in the university and will try to discover the phenomenon among clusters. The research methodology will be data analysis and data mining, especially clustering analysis. 　In this thesis, clustering coefficient used to analyze behaviors of reader, and link weight are used to clustering reader into groups. It is convinced that non-exclusive hierarchical cluster research method can also be applied to the internet analysis. For example, it can be used to analyze the effect of different nodes in the internet and to figure out if the node is qualified to be the core node. Moreover, it is also useful to analyze the relationship in the social network.

APA, Harvard, Vancouver, ISO, and other styles

26

Chen, Kuan-Chi, and 陳冠奇. "Combine Fuzzy Clustering and Correlation Coefficient for Medical Image Analysis." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/20086993754837383229.

Full text

Abstract:

碩士
龍華科技大學
資訊管理系碩士班
102
Automatic visual detection techniques have been widely applied in the medical field in recent years. The advancement of image analysis technology has allowed medical images to provide more accurate references for physicians to use while making diagnoses. However, despite the rapid development of image analysis technology, as body structure, organ size and position differs among patients, the image information may cause misjudgments due to human negligence and noise. This study applied image analysis and detection to CT images of patients with heart disease. In the proposed analytical framework, the correlation coefficient were used for detection. The results found that the correlation coefficient could be applied to the enhanced image gray scale, could be applied to the CT color image, and that both reached a good analysis effect. Finally, the neighborhood intuitionistic fuzzy clustering algorithm was integrated for comparison, in order to propose the image types that were suitable for different types of image analyses. This study was expected to provide a more accurate reference for physicians to use in making diagnoses.

APA, Harvard, Vancouver, ISO, and other styles

27

Yao, Chen-Han, and 姚成翰. "Reliable Local Recovery Routing Protocol with Clustering Coefficient for Ad Hoc Networks." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/21964992404323705393.

Full text

Abstract:

博士
淡江大學
資訊工程學系博士班
100
Nodes in mobile ad hoc network communicate with each other through wireless multi-hop links. When a node wants to send data to another node, it uses some routing protocol to find the path. In on-demand routing protocols, the source starts a route discovery to find the route leading to the destination. Route discovery is typically performed via flooding, which consumes a lot of control packets. Because of node mobility, the network topology change frequently and cause the route broken. Traditional routing protocols restart a route discovery when link failure. In this thesis, we propose two on-demand local recovery routing protocols based on clustering coefficient, (I) "Local Path Recovery Routing Protocol based on Clustering Coefficient "(LPRCC), (II) "Reliable Local Recovery Routing Protocol based on Clustering Coefficient"(RLRCC). Our first protocol LPRCC use route clustering coefficient to choose routing path. When link failure occurs, nodes can quickly salvage the data without starting another route discovery. Our second protocol RLRCC choose a route with higher route score, route score is calculated by link stable value and node triangle value. RLRCC can decrease the number of route failure occur and also can reduce the route discovery times. Simulation results show both of our protocols can decrease the number of control packets and increase route delivery ratio.

APA, Harvard, Vancouver, ISO, and other styles

28

Hwang, Yuan Shiou, and 黃圓修. "Applications of fuzzy clustering method in robust correlation coefficient and robust regression analysis." Thesis, 1994. http://ndltd.ncl.edu.tw/handle/89598475804190416513.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Tsai, Kai-Siang, and 蔡凱翔. "Using local link switching algorithm to control directed and weight network clustering coefficient." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/94797731947635120347.

Full text

Abstract:

碩士
淡江大學
資訊工程學系碩士班
98
Over the past decade the studies of complex networks have been analyzed and researched. In analyzing Clustering coefficient is a important concept Clustering coefficient characterizes the relative tightness of a network and is a defining network statistics that appears in many “real-world” network data. This paper proposed a local link switching algorithm which effectively increases the clustering coefficient of a directed weight network while preserving the network node degree distributions. This link switching algorithm is based on local neighborhood information. Link switching algorithm is widely used in producing similar networks with the same degree distribution, that is, it is used in ‘sampling’ networks from the same network pool. How to use this algorithm to implement in directed and weight network is major study in this paper.

APA, Harvard, Vancouver, ISO, and other styles

30

Arraz, Carlos Fernando da Silva. "Algoritmos de clustering para identificação de subtipos de cancro do estômago, tiroide e pele." Master's thesis, 2021. http://hdl.handle.net/1822/73401.

Full text

Abstract:

Dissertação de mestrado em Matemática e Computação
A análise da expressão genética é fundamental para o reconhecimento dos genes mais relevantes durante as interações celulares num organismo, principalmente quando estes genes estão relacionados com doenças. Para a realização de um estudo em larga escala acerca das mudanças na expressão genética é necessário encontrar um método, a fim de que este o faça minimizando a taxa de erro e desvio, num processo de aprendizagem contínua. Podemos dizer que uma das maiores conquistas científicas das últimas décadas em Bioinformática foi a introdução de métodos de sequenciamento genético de alto desempenho, a possibilitar a visualização da dinâmica das células a nível molecular, como se fossem sensores capazes de fornecer informações preciosas sobre o funcionamento de um sistema vivo. Em 2020, já temos um nível relativamente de baixo custo para o sequenciamento, potencializando a investigação acerca da presença e quantidade de RNA (ou melhor dizendo, marcas do DNA) numa amostra biológica num determinado espaço temporal. Além disso, a introdução de novas técnicas analíticas trouxe “insights” sobre pesquisa biológica e médica. Desta forma, muitos tratamentos poderão, num futuro breve, ser customizados de acordo com a assinatura genética de cada indivíduo, com muito mais eficiência e menos efeitos colaterais. O processo de Mineração de Dados (Data Mining) consiste na extração automática de padrões que representam algum conhecimento inerente a um fenómeno. Em especial, a Clustering Analysis, aplicada neste projeto para a identificação de subtipos de cancro na fase inicial (tumor primário), busca através da aplicação de Machine Learning o reconhecimento de padrões até então desconhecidos. A proposta de trabalho foi a recolha de dados oriundos do Projeto Atlas do Genoma do Câncer (TCGA). Os datasets foram reduzidos (de milhares de genes para apenas algumas dezenas, em alguns casos) e os genes foram combinados para avaliar a qualidade na formação dos clusters ou a accuracy na classificação supervisionada em diversos cenários, revelando resultados promissores e coerentes com a literatura nesta área de investigação. O objetivo central deste trabalho foi obter resultados que corroborassem com as classificações moleculares atuais e/ou descobrir novos subtipos de cancro, principalmente onde há ainda alguma dificuldade/indecisão na identificação destes subtipos, como por exemplo, os cancros de estômago, tiroide e pele. Através de técnicas de seleção de features e de classificação supervisionada e não supervisionada, foi possível avaliar a existência de grupos significativamente diferentes e caracterizá-los em alguns casos.
The analysis of gene expression is fundamental for the identification of the most relevant genes during cellular interactions in an organism, especially when these genes are related to diseases. To carry out a large-scale study on changes in gene expression, it is necessary to find a method, to minimize the error and deviation rate in a continuous learning process. We can say that one of the greatest scientific achievements of the last decades in Bioinformatics was the introduction of high-performance genetic sequencing methods, enabling the visualization of cell dynamics at the molecular level as if they were sensors capable of providing precious information about the functioning of a living system. In 2020, we already have a relatively low-cost level for sequencing, enhancing research into the presence and amount of RNA (or rather, DNA marks) in a biological sample in a given time frame. Besides, the introduction of new analytical techniques brought us “insights” about biological and medical research. In this way, many treatments may, soon, be cost-effective according to the genetic signature of each individual, with much more efficiency and fewer side effects. The Data Mining process consists of the automatic extraction of patterns that represent some knowledge inherent to a phenomenon. In particular, Clustering Analysis, applied in this dissertation for the identification of cancer subtypes in begining stage (primary stage), which seeks, through the application of Machine Learning, the recognition of previously unknown patterns. The work proposal made usage of data from the Atlas Project of the Cancer Genome (TCGA). Datasets have been reduced (from thousands of genes to just a few dozen in some cases) and genes have been combined to assess quality when cluster formation or accuracy in supervised classification in various settings, revealing promising results that are consistent with the literature in this area of research. The main objective of this work was to obtain results that corroborate with the current molecular classifications and/or discover new subtypes of cancer, especially where there is still some difficulty/indecision in the identification of these subtypes, such as stomach, thyroid and skin cancers. Through feature engineering techniques and supervised and unsupervised classification, it was possible to assess the existence of significantly different groups and characterize them in some cases.

APA, Harvard, Vancouver, ISO, and other styles

31

Shafi, Shanjeeda. "Machine learning and mixture clustering methods for molecular drug discovery: prediction and characterisation of drugs and druggable targets." Thesis, 2021. http://hdl.handle.net/1959.13/1431097.

Full text

Abstract:

Research Doctorate - Doctor of Philosophy (PhD)
In the drug discovery process, approximately five to ten thousand compounds are initially screened but only 1% of these enter the preclinical testing stage that determines whether the compound is safe, efficacious, and feasible to use for a disease state. Owing to regulatory, toxicity, resistance and human health concerns, demand is increasing for refinement of and intensive use of molecular physicochemical properties via effective and robust mathematical methods for drug discovery. Chemoinformatics is now a well-recognised discipline focused on searching, identifying and extracting meaningful information from chemical sequences and structures of compounds. A candidate drug is usually a small molecule (~50 atoms) that acts by many different mechanisms of protein. Every year, several drugs are discarded from the market owing to poor pharmacodynamic and pharmacokinetic properties, which motivates this study that attempts to clarify the factors that facilitate compounds to be drug-like. The druglikeness of a molecule is characterised in part by its satisfying Lipinski’s rule-of-five (Ro5) regarding its molecular properties, such as mass and hydrophobicity, which play an important role in oral absorption, distribution, metabolism and excretion. A debate has existed for some time and now accelerated in the industry as to what constitutes a good ‘hit’. Increasing evidence suggests that relying completely on Lipinski’s Ro5 for potential drug synthesis may increase the likelihood of future drug failures. Retrospective analysis of failed drug discovery projects and incorporation of beyond Ro5 rules may provide useful information in innovating drugs for difficult targets. There is an urgent need to develop reliable computational methods for predicting drug-likeness of candidate molecules to identify those unlikely to survive the later stages of discovery and development. Visualisation and machine learning methods are two common approaches to uncover underlying patterns in the pharmacological property space, so called chemo-space, for drug design. Thus far, drug-likeness has been studied from several viewpoints, and in this thesis, we use proposed druggability rules (Hudson et al. 2012, 2014, 2017) to determine cut points for each molecular predictor based on non-Bayesian mixture model-based clustering with discriminant analysis, MC/DA (MclustDA R package). we also used decision tree for choosing cut-off ranges of molecular descriptors. To date, Hudson et al.’s (2014, 2017) results have established an improved scoring function, beyond the cut points of the Ro5. In this thesis, mixture-based modeling (Bayesian and two non-Bayesian) tools are applied via different ‘R’ packages (Rmixmod, depmixS4 and mixAK), to identify good and poor drug candidates using a combination of 9 and 10 molecular physicochemical and structural properties and scoring functions of violations (Hudson et al. 2014, 2017). The non-Bayesian Gaussian mixture method (GMM) is shown to be optimal at classifying true good and poor molecules correctly in terms of Ro5, oral_Ro5 drug-like (Divide into two parts: oral_Ro5 drug-like status1 and oral_Ro5 drug-like status2), eRo5 (extended rule of 5) and bRo5 (beyond rule of five) drugs classification, as suggested recently by Lipinski (2014, 2016) and Doak et al. (2014, 2016). In the thesis, the GMM approach, and the optimal 10 descriptors (whether continuous and categorical) set model (based on the following molecular parameters- MW, logP, logD, Hydrogen bond donors and acceptors, polar surface area, number of atoms and rings, Halogen), shows good predictive performance, with Matthews correlation coefficient (C) values in the range of 0.41–0.58, compared with other descriptors set models using Bayesian (mixAK) and non-Bayesian (HMM) methods in terms of computational time and higher sensitivity, specificity and C values. The GMM classification identified 1013 drug-like molecules of which 4 % were in bRo5 space and 266 non drug-like molecules of which 38% were in bRo5 space, supporting recent trends to more outside the Ro5 region. These mixture models are formed the basis to identify molecules and disease targets in the chemo-space using visualisation methods such as Principal component analysis (PCA), Factor analysis for mixed data (FAMD) and Correspondence analysis (CA). These three visualisation and data reductive methods successfully identify a group of molecules and specific disease targets with a prescribed range of ADME properties in different quadrants in the chemo space. This work also demonstrates that PCA, MCA and FAMD methods could be a powerful technique for exploring complex datasets in drug discovery study to identify outliers. It is shown that both lipophilicity measurement descriptors logP and logD have a significant influence on the facilitation of compounds and DC’s segregations. Two non-Bayesian mixture clustering approaches, the Gaussian mixture method (GMM via Rmixmod) and the Hidden Markov model (HMM via depmixS4) as applied in this thesis permit capture of the global properties of molecules with related targets. Based on these mixture approaches, this study is identified disease targets using the score function and molecular physicochemical properties of drugs-towards target. All mixture clustering models are identified 9 poor/non-druggable and 26 good/druggable targets with the anti-bacterial and adrenergic targets identified as the topmost poor and good druggable target respectively. Furthermore, three popular machine learning (ML) methods, such as (1) recursive partitioning, (2) naïve Bayesian and (3) support vector machine technique was also used to discriminate drug-like and non grug-like molecules based on molecular descriptors. Among these ML techniques, the SVM model is superior in terms of different rule-based drugs classifications and achieved a sensitivity range of 94% to 99% and specificity range of 84% to 100%, likewise exhibiting higher C values 0.68 to 0.99. The three-mixture based clustering with classification analyses results which use both LogD and logP are offering an excellent opportunity to consider these lipophilicity measurement descriptors (logP and logD) in conjunction with other descriptors to help predict permeability and solubility of active compounds in drug discovery. This study has the potential to significantly reduce the false classification of drugs and suggest an appropriate predictor set to help identify for new drug innovations.

APA, Harvard, Vancouver, ISO, and other styles

32

Γεωργιάδης, Γιώργος. "Η παράμετρος της κεντρικότητας σε ανεξάρτητα κλίμακας μεγάλα δίκτυα." 2006. http://nemertes.lis.upatras.gr/jspui/handle/10889/137.

Full text

Abstract:

Ένα φαινόμενο που έκανε την εμφάνισή του τα τελευταία χρόνια είναι η μελέτη μεγάλων δικτύων που εμφανίζουν μια ιεραρχική δομή ανεξαρτήτως κλίμακας (large scale-free networks). Μια παραδοσιακή μέθοδος μοντελοποίησης δικτύων είναι η χρήση γραφημάτων και η χρησιμοποίηση αποτελεσμάτων που προκύπτουν από την Θεωρία Γράφων. Όμως στα κλασικά μοντέλα που έχουν μελετηθεί, δυο κόμβοι του ίδιου γραφήματος έχουν την ίδια πιθανότητα να συνδέονται με οποιουσδήποτε δυο άλλους κόμβους. Αυτός ο τρόπος μοντελοποίησης αποτυγχάνει να περιγράψει πολλά δίκτυα της καθημερινής ζωής, όπως δίκτυα γνωριμιών όπου οι κόμβοι συμβολίζουν ανθρώπους και συνδέονται μεταξύ τους αν γνωρίζονται άμεσα. Σε ένα τέτοιο δίκτυο είναι αναμενόμενο δυο φίλοι κάποιου ατόμου να έχουν μεγαλύτερη πιθανότητα να γνωρίζονται μεταξύ τους από ότι δυο τυχαία επιλεγμένοι ξένοι. Αυτό ακριβώς το φαινόμενο ονομάζεται συσσωμάτωση (clustering) και είναι χαρακτηριστικό για τα εν λόγω δίκτυα. Είναι γεγονός ότι πολλά δίκτυα που συναντώνται στη φύση αλλά και πάρα πολλά ανθρωπογενή δίκτυα εντάσσονται σε αυτήν την κατηγορία. Παραδείγματα τέτοιων είναι τα δίκτυα πρωτεϊνών, δίκτυα τροφικών αλυσίδων, επιδημικής διάδοσης ασθενειών, δίκτυα ηλεκτρικού ρεύματος, υπολογιστών, ιστοσελίδων του Παγκόσμιου Ιστού, δίκτυα γνωριμιών, επιστημονικών αναφορών (citations) κ.α. . Παρότι φαίνεται να άπτονται πολλών επιστημών όπως η Φυσική, η Βιολογία, η Κοινωνιολογία και η Πληροφορική, δεν έχουν τύχει ευρείας μελέτης, καθώς μέχρι στιγμής έλειπαν πραγματικά μεγάλα δίκτυα για πειραματική μελέτη (κενό που καλύφθηκε με την ανάπτυξη του Παγκόσμιου Ιστού). Μέχρι σήμερα δεν έχουν φωτιστεί όλα εκείνα τα σημεία και τα μεγέθη που είναι χαρακτηριστικά για αυτά τα δίκτυα και που πρέπει να εστιάσει η επιστημονική έρευνα, παρόλα αυτά έχει γίνει κάποια πρόοδος. Μια τέτοια έννοια που μπορεί να εκφραστεί με πολλά μεγέθη είναι η έννοια της κεντρικότητας (centrality) ενός κόμβου στο δίκτυο. Η χρησιμότητα ενός τέτοιου μεγέθους, αν μπορεί να οριστεί, είναι προφανής, για παράδειγμα στον τομέα της εσκεμμένης «επίθεσης» σε ένα τέτοιο δίκτυο (π.χ. δίκτυο υπολογιστών). Η ακριβής όμως συσχέτιση της κεντρικότητας με τα άλλα χαρακτηριστικά μεγέθη του δικτύου, όπως η συσσωμάτωση, δεν είναι γνωστή. Στόχος της εργασίας είναι να εμβαθύνει στην έννοια της κεντρικότητας, και χρησιμοποιεί σαν πεδίο πειραματισμών τον χώρο της εσκεμμένης επίθεσης σε ανεξάρτητα κλίμακας δίκτυα. Στο πλαίσιο αυτό γίνεται μια συνοπτική παρουσίαση των μοντέλων δικτύων που έχουν προταθεί μέχρι σήμερα και αναλύεται η έννοια της κεντρικότητας μέσω των παραδοσιακών ορισμών της από την επιστήμη της Κοινωνιολογίας. Στη συνέχεια προτείνεται μια σειρά ορισμών της κεντρικότητας που την συνδέουν με μεγέθη του δικτύου όπως ο συντελεστής συσσωμάτωσης. Η καταλληλότητα των ορισμών αυτών διαπιστώνεται στην πράξη, εξομοιώνοντας πειραματικά επιθέσεις σε ανεξάρτητα κλίμακας μεγάλα δίκτυα και χρησιμοποιώντας στρατηγικές επίθεσης που βασίζονται σε αυτές.
A trend in recent years is the study of large networks which possess a hierarchical structure independent of the current scale (large scale-free networks). A traditional method of network modelling is the use of graphs and the usage of results based on Graph Theory. Until recently, the classical models studied, describe the probability of two random vertices connecting with each other as equal for all pairs of vertices. This modelling fails to describe many everyday networks such as acquaintance networks, where the vertices are individuals and connect with an edge if they know each other

APA, Harvard, Vancouver, ISO, and other styles

33

Liang, Chieh-Hsiang, and 梁捷翔. "Extreme Clustering Coefficients In High Edge Density Networks." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/31972955354085060082.

Full text

Abstract:

碩士
淡江大學
資訊工程學系資訊網路與通訊碩士班
98
This paper proposed two models with extreme average clustering coefficients and small path length properties for high edge density network. High density networks are common in the analysis of social networks and biological networks. This paper studies networks with extreme statistical properties, that is, max/min clustering coefficients and short average distances. In addition to those properties, the proposed models indicated that in addition to the existing small-world network model and random network model, there are other network models that may produce clustering coefficients filling the gap between those two models and the maximal achievable clustering coefficients.

APA, Harvard, Vancouver, ISO, and other styles

34

Lee, Che-Chun, and 李哲均. "Finding Overlapping Communities by Local Clustering Coefficients of Seed Nodes." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/m37pq3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Huang, Shi-Yu, and 黃士育. "Overlapping Community Discovery by Combining Local Clustering Coefficients and Neighbor Relationship Measurements." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/r4tqjq.

Full text

Abstract:

碩士
樹德科技大學
資訊工程系碩士班
105
Most users of online social networks play different roles at different times due to the diversity of their interests. Overlapping community discovery studies the complexity involved in interpersonal social networks, using various techniques of Social Network Analysis (SNA). SNA identifies seed nodes of social networks, based on which hidden overlapping communities could be found by gradually merging neighboring seeds to form large groups. In methods that select nodes of high degrees only, close-knit groups consisting of nodes of low degrees are often neglected. To overcome the problem, this study proposes to select nodes of high Local Clustering Coefficients (LCC) as seeds and then examine the relationship degrees between neighboring seeds to discover overlapping communities. The proposed method was compared with those adopting nodes of high degrees as seeds, as well as the famous Clique Percolation Method (CPM). The result showed effective improvement in grouping quality and graph efficiency.

APA, Harvard, Vancouver, ISO, and other styles

36

Li, Zheng-Kuan, and 李政寬. "Applying Regression Coefficients Clustering in Multivariate Time Series Transforming for 3D Convolutional Neural Networks." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/cpv42m.

Full text

Abstract:

碩士
國立臺灣科技大學
工業管理系
107
Multivariate time series data is very common in real life. Since most problems not only consider a single variable, but also multiple variables affect the label, how to effectively solve the problem of multivariate time series classification remain a major problem in research. In recent years, with the rapid development of Artificial Intelligence (AI), the deep learning framework has been tried to deal with multivariate time series classification problems. This study proposes a method to solve the problem of MTS classification. The multivariate time series data is used to find the regression equation by regression analysis. We use the regression coefficient and intercept to the cluster so that the time series with similar trends are divided into the same cluster, and the literature proposes to the four frameworks to encode time series data as different types of images. According to the clustering results, the time series with similar trends will be used the same method to encode time series into images and try a variety of experiment to determine encoding method for each cluster of time series. After encoding multivariate time series data as images according to the above method, each data is input into the 3D convolutional neural networks for feature extraction and image recognition, which can effectively solve the multivariate time series classification problem and find the best classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

37

Shiping, Liu. "Synthetic notions of curvature and applications in graph theory." Doctoral thesis, 2012. https://ul.qucosa.de/id/qucosa%3A11816.

Full text

Abstract:

The interaction between the study of geometric and analytic aspects of Riemannian manifolds and that of graphs is a very amazing subject. The study of synthetic curvature notions on graphs adds new contributions to this topic. In this thesis, we mainly study two kinds of synthetic curvature notions: the Ollivier-Ricci cuvature on locally finite graphs and the combinatorial curvature on infinite semiplanar graphs. In the first part, we study the Ollivier-Ricci curvature. As known in Riemannian geometry, a lower Ricci curvature bound prevents geodesics from diverging too fast on average. We translate this Riemannian idea into a combinatorial setting using the Olliver-Ricci curvature notion. Note that on a graph, the analogue of geodesics starting in different directions, but eventually approaching each other again, would be a triangle. We derive lower and upper Ollivier-Ricci curvature bounds on graphs in terms of number of triangles, which is sharp for instance for complete graphs. We then describe the relation between Ollivier-Ricci curvature and the local clustering coefficient, which is an important concept in network analysis introduced by Watts-Strogatz. Furthermore, positive lower boundedness of Ollivier-Ricci curvature for neighboring vertices imply the existence of at least one triangle. It turns out that the existence of triangles can also improve Lin-Yau\''s curvature dimension inequality on graphs and then produce an implication from Ollivier-Ricci curvature lower boundedness to the curvature dimension inequality. The existence of triangles prevents a graph from being bipartite. A finite graph is bipartite if and only if its largest eigenvalue equals 2. Therefore it is natural that Ollivier-Ricci curvature is closely related to the largest eigenvalue estimates. We combine Ollivier-Ricci curvature notion with the neighborhood graph method developed by Bauer-Jost to study the spectrum estimates of a finite graph. We can always obtain nontrivial estimates on a non-bipartite graph even if its curvature is nonpositive. This answers one of Ollivier\''s open problem in the finite graph setting. In the second part of this thesis, we study systematically infinite semiplanar graphs with nonnegative combinatorial curvature. Unlike the previous Gauss-Bonnet formula approach, we explore an Alexandrov approach based on the observation that the nonnegative combinatorial curvature on a semiplanar graph is equivalent to nonnegative Alexandrov curvature on the surface obtained by replacing each face by a regular polygon of side length one with the same facial degree and gluing the polygons along common edges. Applying Cheeger-Gromoll splitting theorem on the surface, we give a metric classification of infinite semiplanar graphs with nonnegative curvature. We also construct the graphs embedded into the projective plane minus one point. Those constructions answer a question proposed by Chen. We further prove the volume doubling property and Poincare inequality which make the running of Nash-Moser iteration possible. We in particular explore the volume growth behavior on Archimedean tilings on a plane and prove that they satisfy a weak version of relative volume comparison with constant 1. With the above two basic inequalities in hand, we study the geometric function theory of infinite semiplanar graphs with nonnegative curvature. We obtain the Liouville type theorem for positive harmonic functions, the parabolicity. We also prove a dimension estimate for polynomial growth harmonic functions, which is an extension of the solution of Colding-Minicozzi of a conjecture of Yau in Riemannian geometry.

APA, Harvard, Vancouver, ISO, and other styles

38

Δαγκλής, Οδυσσέας. "Ζητήματα μοντελοποίησης και προσέγγισης του χρωματικού αριθμού σε scale-free δίκτυα." Thesis, 2009. http://nemertes.lis.upatras.gr/jspui/handle/10889/2104.

Full text

Abstract:

Δίκτυα που εμφανίζουν μόνιμα μια συγκεκριμένη ιδιότητα ανεξάρτητα από το μέγεθος και την πυκνότητά τους ονομάζονται ανεξάρτητα από την κλίμακα (scale-free). Σε πολλά πραγματικά δίκτυα αυτή η ιδιότητα ταυτίζεται με την κατανομή των βαθμών των κόμβων σύμφωνα με τον νόμο της δύναμης με εκθέτη στο διάστημα [2..4]. Η εργασία παρουσιάζει τρία στατικά μοντέλα κατασκευής scale-free δικτύων με την παραπάνω ιδιότητα, βασισμένα στο δυναμικό μοντέλο Barabási-Albert, και επιχειρεί να προσεγγίσει πειραματικά τον χρωματικό τους αριθμό.
Networks that exhibit a certain quality irrespective of their size and density are called scale-free. In many real-life networks this quality coincides with a power-law distribution of the nodes' degree with exponent ranging in [2..4]. This work presents three static models for constructing scale-free networks, based on the dynamic Barabási-Albert model, and attempts to experimentally approximate their chromatic number.

APA, Harvard, Vancouver, ISO, and other styles

39

(10725786), James Michael Amstutz. "Cluster-Based Analysis Of Retinitis Pigmentosa Candidate Modifiers Using Drosophila Eye Size And Gene Expression Data." Thesis, 2021.

Find full text

Abstract:

The goal of this thesis is to algorithmically identify candidate modifiers for retinitis pigmentosa (RP) to help improve therapy and predictions for this genetic disorder that may lead to a complete loss of vision. A current research by (Chow et al., 2016) focused on the genetic contributors to RP by trying to recognize a correlation between genetic modifiers and phenotypic variation in female Drosophila melanogaster, or fruit flies. In comparison to the genome-wide association analysis carried out in Chow et al.’s research, this study proposes using a K-Means clustering algorithm on RNA expression data to better understand which genes best exhibit characteristics of the RP degenerative model. Validating this algorithm’s effectiveness in identifying suspected genes takes priority over their classification.

This study investigates the linear relationship between Drosophila eye size and genetic expression to gather statistically significant, strongly correlated genes from the clusters with abnormally high or low eye sizes. The clustering algorithm is implemented in the R scripting language, and supplemental information details the steps of this computational process. Running the mean eye size and genetic expression data of 18,140 female Drosophila genes and 171 strains through the proposed algorithm in its four variations helped identify 140 suspected candidate modifiers for retinal degeneration. Although none of the top candidate genes found in this study matched Chow’s candidates, they were all statistically significant and strongly correlated, with several showing links to RP. These results may continue to improve as more of the 140 suspected genes are annotated using identical or comparative approaches.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Clustering coefficient'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles