Dissertations / Theses on the topic 'Clustering coefficient'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 39 dissertations / theses for your research on the topic 'Clustering coefficient.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Parikh, Nidhi Kiranbhai. "Generating Random Graphs with Tunable Clustering Coefficient." Thesis, Virginia Tech, 2011. http://hdl.handle.net/10919/31591.
Full textMaster of Science
Jäger, Simon [Verfasser]. "Exponential domination, exponential independence, and the clustering coefficient / Simon Jäger." Ulm : Universität Ulm, 2017. http://d-nb.info/114748449X/34.
Full textHeinrich, Irene [Verfasser]. "On Graph Decomposition: Hajós' Conjecture, the Clustering Coefficient and Dominating Sets / Irene Heinrich." München : Verlag Dr. Hut, 2020. http://d-nb.info/1219606197/34.
Full textOppong, Augustine. "Clustering Mixed Data: An Extension of the Gower Coefficient with Weighted L2 Distance." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/etd/3463.
Full textNascimento, Mariá Cristina Vasconcelos. "Metaheurísticas para o problema de agrupamento de dados em grafo." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-17052010-155334/.
Full textGraph clustering aims at identifying highly connected groups or clusters of nodes of a graph. This problem can assume others nomenclatures, such as: graph partitioning problem and community detection problem. There are many mathematical formulations to model this problem, each one with advantages and disadvantages. Most of these formulations have the disadvantage of requiring the definition of the number of clusters in the final partition. Nevertheless, this type of information is not found in graphs for clustering, i.e., whose data are unlabeled. This is one of the reasons for the popularization in the last decades of the measure known as modularity, which is being maximized to find graph partitions. This formulation does not require the definition of the number of clusters of the partitions to be produced, and produces high quality partitions. In this Thesis, Greedy Randomized Search Procedures metaheuristics for two existing graph clustering mathematical formulations are proposed: one for the maximization of the partition modularity and the other for the maximization of the intra-cluster similarity. The results obtained by these proposed metaheuristics outperformed the results from other heuristics found in the literature. However, their computational cost was high, mainly for the metaheuristic for the maximization of modularity model. Along the years, researches revealed that the formulation that maximizes the modularity of the partitions has some limitations. In order to promote a good alternative for the maximization of the partition modularity model, this Thesis proposed new mathematical formulations for graph clustering for weighted and unweighted graphs, aiming at finding partitions with high connectivity clusters. Furthermore, the proposed formulations are able to provide partitions without a previous definition of the true number of clusters. Computational tests with hundreds of weighted graphs confirmed the efficiency of the proposed models. Comparing the partitions from all studied formulations in this Thesis, it was possible to observe that the proposed formulations presented better results, even better than the maximization of partition modularity. These results are characterized by satisfactory partitions with high correlation with the true classification for the simulated and real data (mostly biological)
Koomson, Obed. "Performance Assessment of The Extended Gower Coefficient on Mixed Data with Varying Types of Functional Data." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/etd/3512.
Full textLi, Han. "Statistical Modeling and Analysis of Bivariate Spatial-Temporal Data with the Application to Stream Temperature Study." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/70862.
Full textPh. D.
Stephens, Skylar Nicholas. "Analytical and Computational Micromechanics Analysis of the Effects of Interphase Regions, Orientation, and Clustering on the Effective Coefficient of Thermal Expansion of Carbon Nanotube-Polymer Nanocomposites." Thesis, Virginia Tech, 2013. http://hdl.handle.net/10919/23216.
Full textMaster of Science
Dhanasetty, Abhishek. "Enumerating Approximate Maximal Cliques in a Distributed Framework." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1617104719399743.
Full textLee, James H. "A pollination network of Cornus florida." VCU Scholars Compass, 2014. http://scholarscompass.vcu.edu/etd/3615.
Full textStewart, Craig R. "An Evolutionary Analysis of the Internet Autonomous System Network." Kent State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=kent1276550359.
Full textBrooks, Josh Daniel. "Nested (2,r)-regular graphs and their network properties." Digital Commons @ East Tennessee State University, 2012. https://dc.etsu.edu/etd/1471.
Full textEr, Chiangkai. "Speech recognition by clustering wavelet and PLP coefficients." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/42742.
Full textRibeiro, Filho Napoleão Póvoa. "Melhorando o desempenho da técnica de clusterização hierárquica single linkage utilizando a metaheurística GRASP." Universidade Federal do Tocantins, 2016. http://hdl.handle.net/11612/974.
Full textThe problem of clustering (grouping) consists of, from a database, group the elements so that more queries are in the same cluster (group) and less similar elements are different clusters. There are several ways to accomplish these groupings. One of the most popular is the hierarchical, where a hierarchical relationships between the elements is created. There are several methods of analyzing the similarity between elements in the clustering problem. The most common among them is the single linkage method, which brings together the elements that are experiencing less apart. To apply the technique in question, distance matrix is the input used. This grouping process generates the end an inverted tree known as dendrogram. The cophenetic correlation coefficient (ccc), obtained after the construction of the dendrogram is a measure used to evaluate the consistency of the clusters generated and indicates how faithful he is in relation to the original data. Thus, a dendrogram gives more consistent clusters when the ccc is closer to one (1). The clustering problem in all its aspects, including hierarchical clustering (object of study in this work), belongs to the class of NP-complete problems. Therefore, it is common to use heuristics for efficient solutions to this problem. In order to generate dendrograms that result in better ccc, it is proposed in this paper a new algorithm that uses the concepts of GRASP metaheuristic. It is also objective of this work to implement such a solution in parallel computing in a computer cluster, thus working with arrays larger. Tests were conducted to confirm the performance of the proposed algorithm, comparing the results with those generated by the software R.
Zhuang, Yuwen. "Metric Based Automatic Event Segmentation and Network Properties Of Experience Graphs." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1337372416.
Full textLin, Kaisheng. "Motif counts, clustering coefficients and vertex degrees in models of random networks." Thesis, University of Oxford, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.497038.
Full textLuo, Hongwei, and Hongwei luo@rmit edu au. "Modelling and simulation of large-scale complex networks." RMIT University. Mathematical and Geospatial Sciences, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080506.142224.
Full textMontilla, Michaela. "Vliv parcelačního atlasu na kvalitu klasifikace pacientů s neurodegenerativním onemocněním." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-378150.
Full textMartinho, Maria. "Spatial analysis of exposure coefficients with applications to stomach cancer." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:427fe13e-39b1-4bfd-a3a8-be957120cf44.
Full textCOSTANTINI, GIULIO. "Network analysis: a new perspective on personality psychology." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2015. http://hdl.handle.net/10281/75269.
Full textShiping, Liu. "Synthetic notions of curvature and applications in graph theory." Doctoral thesis, Universitätsbibliothek Leipzig, 2013. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-102197.
Full textJaskowiak, Pablo Andretta. "Estudo de coeficientes de correlação para medidas de proximidade em dados de expressão gênica." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05052011-143134/.
Full textThe development of microarray technology made possible the expression level measurement of hundreds or even thousands of genes simultaneously for various experimental conditions. The huge amount of available data generated the need for computational methods that allow its analysis in an effcient and automated way. In many of the computational methods employed during gene expression data analysis the choice of a proximity measure is necessary. Among the proximity measures available, correlation coefficients have been widely employed because of their ability to capture similarity trends among the compared numeric sequences (genes or samples). The present work has as objective to compare different correlation measures for the three major tasks involved in the analysis of gene expression data: clustering, feature selection and classification. To this extent, in this dissertation an overview of gene expression data analysis and the different correlation measures considered for this comparison are presented. In the present work are also presented empirical results obtained from the comparison of correlation coefficients for gene clustering, sample clustering, gene selection for sample classification and sample classification
Blini, Elvio A. "Biases in Visuo-Spatial Attention: from Assessment to Experimental Induction." Doctoral thesis, Università degli studi di Padova, 2016. http://hdl.handle.net/11577/3424480.
Full textIn questo lavoro presenterò una serie di ricerche che possono sembrare piuttosto eterogenee per quesiti sperimentali e approcci metodologici, ma sono tuttavia legate da un filo conduttore comune: i costrutti di ragionamento e attenzione spaziale. Affronterò in particolare aspetti legati alla valutazione delle asimmetrie attenzionali, nell'individuo sano come nel paziente con disturbi neurologici, il loro ruolo in vari aspetti della cognizione umana, e i loro substrati neurali, guidato dalla convinzione che l’attenzione spaziale giochi un ruolo importante in svariati processi mentali non necessariamente limitati alla percezione. Quanto segue è stato dunque organizzato in due sezioni distinte. Nella prima mi soffermerò sulla valutazione delle asimmetrie visuospaziali, iniziando dalla descrizione di un nuovo paradigma particolarmente adatto a questo scopo. Nel primo capitolo descriverò gli effetti del doppio compito e del carico attenzionale su un test di monitoraggio spaziale; il risultato principale mostra un netto peggioramento nella prestazione al compito di detezione spaziale in funzione del carico di memoria introdotto. Nel secondo capitolo applicherò lo stesso paradigma ad una popolazione clinica contraddistinta da lesione cerebrale dell’emisfero sinistro. Nonostante una valutazione neuropsicologica standard non evidenziasse alcun deficit lateralizzato dell’attenzione, mostrerò che sfruttare un compito accessorio può portare ad una spiccata maggiore sensibilità dei test diagnostici, con evidenti ricadute benefiche sull'iter clinico e terapeutico dei pazienti. Infine, nel terzo capitolo suggerirò, tramite dati preliminari, che asimmetrie attenzionali possono essere individuate, nell'individuo sano, anche lungo l’asse sagittale; argomenterò, in particolare, che attorno allo spazio peripersonale sembrano essere generalmente concentrate più risorse attentive, e che i benefici conseguenti si estendono a compiti di varia natura (ad esempio compiti di discriminazione). Passerò dunque alla seconda sezione, in cui, seguendo una logica inversa, indurrò degli spostamenti nel focus attentivo in modo da valutarne il ruolo in compiti di varia natura. Nei capitoli quarto e quinto sfrutterò delle stimolazioni sensoriali: la stimolazione visiva optocinetica e la stimolazione galvanico vestibolare, rispettivamente. Nel quarto capitolo mostrerò che l’attenzione spaziale è coinvolta nella cognizione numerica, con cui intrattiene rapporti bidirezionali. Nello specifico mostrerò da un lato che la stimolazione optocinetica può modulare l’occorrenza di errori procedurali nel calcolo mentale, dall'altro che il calcolo stesso ha degli effetti sull'attenzione spaziale e in particolare sul comportamento oculomotorio. Nel quinto capitolo esaminerò gli effetti della stimolazione galvanica vestibolare, una tecnica particolarmente promettente per la riabilitazione dei disturbi attentivi lateralizzati, sulle rappresentazioni mentali dello spazio. Discuterò in modo critico un recente modello della negligenza spaziale unilaterale, suggerendo che stimolazioni e disturbi vestibolari possano sì avere ripercussioni sulle rappresentazioni metriche dello spazio, ma senza comportare necessariamente inattenzione per lo spazio stesso. Infine, nel sesto capitolo descriverò gli effetti di cattura dell’attenzione visuospaziale che stimoli distrattori intrinsecamente motivanti possono esercitare nell'adulto sano. Cercherò, in particolare, di predire l’entità di questa cattura attenzionale partendo da immagini di risonanza magnetica funzionale a riposo: riporterò dati preliminari focalizzati sull'importanza del circuito cingolo-opercolare, effettuando un parallelismo con popolazioni cliniche caratterizzate da comportamenti di dipendenza.
Chou, Chin-Hou, and 周金侯. "The Relationship between Clustering Coefficient and Network Congestion." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/42881159196246437382.
Full text淡江大學
資訊工程學系資訊網路與通訊碩士班
98
The Internet has been grows vigorously in recent years. The users and the data that will transmission by the network are increasing every year, but the network transmission technology and resource are not enough to deal with so much data. It triggers the network congestion phenomenon. Now, to reduce the network congestion phenomenon becomes an important topic. In this paper, we used the local link switching technology which will improve the network clustering coefficient and maintain the size of the network to adjust the network structure. Then we study the influence of the network congestion phenomenon by clustering coefficient. We observed that the network with higher clustering coefficient can execute the more transmission tasks at the same time. In the result, scale free network and small world network can increase the packet density by increasing the clustering coefficient, but random network doesn’t have this property. Additionally, the network model will influence the network congestion phenomenon, too. By our method that increases the network clustering coefficient, the random network and scale free network has the best result in the efficiency of transmission. The small world network decreased its own efficiency of transmission when we increased its clustering coefficient.
Zhi-Hao, Zhong, and 鍾志豪. "An Analysis of Library Readers Using Clustering Coefficient." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/21454621440635594698.
Full text東南科技大學
電機工程研究所
101
In recent years, the quantity of data explodes. Under this circumstance, automated library system records reader’s information in the database. A lot of information is hidden in the data. Data mining is to discover the data and to turn it into information which can be used. Data mining has been used in many areas such as marketing and customer relationship management (CRM). Based on the result of data mining, the owners can change merchandise display and understand customers’ life style and habits in order to increase the volume of sales. To use data mining in libraries allows us to relocate the books and to recommend book lists to readers. In the long term effect, the book circulation can be improved. Many internet services could provide different types of personalization and customization. The technologies allow the service providers to give more specific service by understanding customer behaviors. However, this technology has been neglected in the libraries. Therefore the thesis will focus on readers’ reading history in the university and will try to discover the phenomenon among clusters. The research methodology will be data analysis and data mining, especially clustering analysis. In this thesis, clustering coefficient used to analyze behaviors of reader, and link weight are used to clustering reader into groups. It is convinced that non-exclusive hierarchical cluster research method can also be applied to the internet analysis. For example, it can be used to analyze the effect of different nodes in the internet and to figure out if the node is qualified to be the core node. Moreover, it is also useful to analyze the relationship in the social network.
Chen, Kuan-Chi, and 陳冠奇. "Combine Fuzzy Clustering and Correlation Coefficient for Medical Image Analysis." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/20086993754837383229.
Full text龍華科技大學
資訊管理系碩士班
102
Automatic visual detection techniques have been widely applied in the medical field in recent years. The advancement of image analysis technology has allowed medical images to provide more accurate references for physicians to use while making diagnoses. However, despite the rapid development of image analysis technology, as body structure, organ size and position differs among patients, the image information may cause misjudgments due to human negligence and noise. This study applied image analysis and detection to CT images of patients with heart disease. In the proposed analytical framework, the correlation coefficient were used for detection. The results found that the correlation coefficient could be applied to the enhanced image gray scale, could be applied to the CT color image, and that both reached a good analysis effect. Finally, the neighborhood intuitionistic fuzzy clustering algorithm was integrated for comparison, in order to propose the image types that were suitable for different types of image analyses. This study was expected to provide a more accurate reference for physicians to use in making diagnoses.
Yao, Chen-Han, and 姚成翰. "Reliable Local Recovery Routing Protocol with Clustering Coefficient for Ad Hoc Networks." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/21964992404323705393.
Full text淡江大學
資訊工程學系博士班
100
Nodes in mobile ad hoc network communicate with each other through wireless multi-hop links. When a node wants to send data to another node, it uses some routing protocol to find the path. In on-demand routing protocols, the source starts a route discovery to find the route leading to the destination. Route discovery is typically performed via flooding, which consumes a lot of control packets. Because of node mobility, the network topology change frequently and cause the route broken. Traditional routing protocols restart a route discovery when link failure. In this thesis, we propose two on-demand local recovery routing protocols based on clustering coefficient, (I) "Local Path Recovery Routing Protocol based on Clustering Coefficient "(LPRCC), (II) "Reliable Local Recovery Routing Protocol based on Clustering Coefficient"(RLRCC). Our first protocol LPRCC use route clustering coefficient to choose routing path. When link failure occurs, nodes can quickly salvage the data without starting another route discovery. Our second protocol RLRCC choose a route with higher route score, route score is calculated by link stable value and node triangle value. RLRCC can decrease the number of route failure occur and also can reduce the route discovery times. Simulation results show both of our protocols can decrease the number of control packets and increase route delivery ratio.
Hwang, Yuan Shiou, and 黃圓修. "Applications of fuzzy clustering method in robust correlation coefficient and robust regression analysis." Thesis, 1994. http://ndltd.ncl.edu.tw/handle/89598475804190416513.
Full textTsai, Kai-Siang, and 蔡凱翔. "Using local link switching algorithm to control directed and weight network clustering coefficient." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/94797731947635120347.
Full text淡江大學
資訊工程學系碩士班
98
Over the past decade the studies of complex networks have been analyzed and researched. In analyzing Clustering coefficient is a important concept Clustering coefficient characterizes the relative tightness of a network and is a defining network statistics that appears in many “real-world” network data. This paper proposed a local link switching algorithm which effectively increases the clustering coefficient of a directed weight network while preserving the network node degree distributions. This link switching algorithm is based on local neighborhood information. Link switching algorithm is widely used in producing similar networks with the same degree distribution, that is, it is used in ‘sampling’ networks from the same network pool. How to use this algorithm to implement in directed and weight network is major study in this paper.
Arraz, Carlos Fernando da Silva. "Algoritmos de clustering para identificação de subtipos de cancro do estômago, tiroide e pele." Master's thesis, 2021. http://hdl.handle.net/1822/73401.
Full textA análise da expressão genética é fundamental para o reconhecimento dos genes mais relevantes durante as interações celulares num organismo, principalmente quando estes genes estão relacionados com doenças. Para a realização de um estudo em larga escala acerca das mudanças na expressão genética é necessário encontrar um método, a fim de que este o faça minimizando a taxa de erro e desvio, num processo de aprendizagem contínua. Podemos dizer que uma das maiores conquistas científicas das últimas décadas em Bioinformática foi a introdução de métodos de sequenciamento genético de alto desempenho, a possibilitar a visualização da dinâmica das células a nível molecular, como se fossem sensores capazes de fornecer informações preciosas sobre o funcionamento de um sistema vivo. Em 2020, já temos um nível relativamente de baixo custo para o sequenciamento, potencializando a investigação acerca da presença e quantidade de RNA (ou melhor dizendo, marcas do DNA) numa amostra biológica num determinado espaço temporal. Além disso, a introdução de novas técnicas analíticas trouxe “insights” sobre pesquisa biológica e médica. Desta forma, muitos tratamentos poderão, num futuro breve, ser customizados de acordo com a assinatura genética de cada indivíduo, com muito mais eficiência e menos efeitos colaterais. O processo de Mineração de Dados (Data Mining) consiste na extração automática de padrões que representam algum conhecimento inerente a um fenómeno. Em especial, a Clustering Analysis, aplicada neste projeto para a identificação de subtipos de cancro na fase inicial (tumor primário), busca através da aplicação de Machine Learning o reconhecimento de padrões até então desconhecidos. A proposta de trabalho foi a recolha de dados oriundos do Projeto Atlas do Genoma do Câncer (TCGA). Os datasets foram reduzidos (de milhares de genes para apenas algumas dezenas, em alguns casos) e os genes foram combinados para avaliar a qualidade na formação dos clusters ou a accuracy na classificação supervisionada em diversos cenários, revelando resultados promissores e coerentes com a literatura nesta área de investigação. O objetivo central deste trabalho foi obter resultados que corroborassem com as classificações moleculares atuais e/ou descobrir novos subtipos de cancro, principalmente onde há ainda alguma dificuldade/indecisão na identificação destes subtipos, como por exemplo, os cancros de estômago, tiroide e pele. Através de técnicas de seleção de features e de classificação supervisionada e não supervisionada, foi possível avaliar a existência de grupos significativamente diferentes e caracterizá-los em alguns casos.
The analysis of gene expression is fundamental for the identification of the most relevant genes during cellular interactions in an organism, especially when these genes are related to diseases. To carry out a large-scale study on changes in gene expression, it is necessary to find a method, to minimize the error and deviation rate in a continuous learning process. We can say that one of the greatest scientific achievements of the last decades in Bioinformatics was the introduction of high-performance genetic sequencing methods, enabling the visualization of cell dynamics at the molecular level as if they were sensors capable of providing precious information about the functioning of a living system. In 2020, we already have a relatively low-cost level for sequencing, enhancing research into the presence and amount of RNA (or rather, DNA marks) in a biological sample in a given time frame. Besides, the introduction of new analytical techniques brought us “insights” about biological and medical research. In this way, many treatments may, soon, be cost-effective according to the genetic signature of each individual, with much more efficiency and fewer side effects. The Data Mining process consists of the automatic extraction of patterns that represent some knowledge inherent to a phenomenon. In particular, Clustering Analysis, applied in this dissertation for the identification of cancer subtypes in begining stage (primary stage), which seeks, through the application of Machine Learning, the recognition of previously unknown patterns. The work proposal made usage of data from the Atlas Project of the Cancer Genome (TCGA). Datasets have been reduced (from thousands of genes to just a few dozen in some cases) and genes have been combined to assess quality when cluster formation or accuracy in supervised classification in various settings, revealing promising results that are consistent with the literature in this area of research. The main objective of this work was to obtain results that corroborate with the current molecular classifications and/or discover new subtypes of cancer, especially where there is still some difficulty/indecision in the identification of these subtypes, such as stomach, thyroid and skin cancers. Through feature engineering techniques and supervised and unsupervised classification, it was possible to assess the existence of significantly different groups and characterize them in some cases.
Shafi, Shanjeeda. "Machine learning and mixture clustering methods for molecular drug discovery: prediction and characterisation of drugs and druggable targets." Thesis, 2021. http://hdl.handle.net/1959.13/1431097.
Full textIn the drug discovery process, approximately five to ten thousand compounds are initially screened but only 1% of these enter the preclinical testing stage that determines whether the compound is safe, efficacious, and feasible to use for a disease state. Owing to regulatory, toxicity, resistance and human health concerns, demand is increasing for refinement of and intensive use of molecular physicochemical properties via effective and robust mathematical methods for drug discovery. Chemoinformatics is now a well-recognised discipline focused on searching, identifying and extracting meaningful information from chemical sequences and structures of compounds. A candidate drug is usually a small molecule (~50 atoms) that acts by many different mechanisms of protein. Every year, several drugs are discarded from the market owing to poor pharmacodynamic and pharmacokinetic properties, which motivates this study that attempts to clarify the factors that facilitate compounds to be drug-like. The druglikeness of a molecule is characterised in part by its satisfying Lipinski’s rule-of-five (Ro5) regarding its molecular properties, such as mass and hydrophobicity, which play an important role in oral absorption, distribution, metabolism and excretion. A debate has existed for some time and now accelerated in the industry as to what constitutes a good ‘hit’. Increasing evidence suggests that relying completely on Lipinski’s Ro5 for potential drug synthesis may increase the likelihood of future drug failures. Retrospective analysis of failed drug discovery projects and incorporation of beyond Ro5 rules may provide useful information in innovating drugs for difficult targets. There is an urgent need to develop reliable computational methods for predicting drug-likeness of candidate molecules to identify those unlikely to survive the later stages of discovery and development. Visualisation and machine learning methods are two common approaches to uncover underlying patterns in the pharmacological property space, so called chemo-space, for drug design. Thus far, drug-likeness has been studied from several viewpoints, and in this thesis, we use proposed druggability rules (Hudson et al. 2012, 2014, 2017) to determine cut points for each molecular predictor based on non-Bayesian mixture model-based clustering with discriminant analysis, MC/DA (MclustDA R package). we also used decision tree for choosing cut-off ranges of molecular descriptors. To date, Hudson et al.’s (2014, 2017) results have established an improved scoring function, beyond the cut points of the Ro5. In this thesis, mixture-based modeling (Bayesian and two non-Bayesian) tools are applied via different ‘R’ packages (Rmixmod, depmixS4 and mixAK), to identify good and poor drug candidates using a combination of 9 and 10 molecular physicochemical and structural properties and scoring functions of violations (Hudson et al. 2014, 2017). The non-Bayesian Gaussian mixture method (GMM) is shown to be optimal at classifying true good and poor molecules correctly in terms of Ro5, oral_Ro5 drug-like (Divide into two parts: oral_Ro5 drug-like status1 and oral_Ro5 drug-like status2), eRo5 (extended rule of 5) and bRo5 (beyond rule of five) drugs classification, as suggested recently by Lipinski (2014, 2016) and Doak et al. (2014, 2016). In the thesis, the GMM approach, and the optimal 10 descriptors (whether continuous and categorical) set model (based on the following molecular parameters- MW, logP, logD, Hydrogen bond donors and acceptors, polar surface area, number of atoms and rings, Halogen), shows good predictive performance, with Matthews correlation coefficient (C) values in the range of 0.41–0.58, compared with other descriptors set models using Bayesian (mixAK) and non-Bayesian (HMM) methods in terms of computational time and higher sensitivity, specificity and C values. The GMM classification identified 1013 drug-like molecules of which 4 % were in bRo5 space and 266 non drug-like molecules of which 38% were in bRo5 space, supporting recent trends to more outside the Ro5 region. These mixture models are formed the basis to identify molecules and disease targets in the chemo-space using visualisation methods such as Principal component analysis (PCA), Factor analysis for mixed data (FAMD) and Correspondence analysis (CA). These three visualisation and data reductive methods successfully identify a group of molecules and specific disease targets with a prescribed range of ADME properties in different quadrants in the chemo space. This work also demonstrates that PCA, MCA and FAMD methods could be a powerful technique for exploring complex datasets in drug discovery study to identify outliers. It is shown that both lipophilicity measurement descriptors logP and logD have a significant influence on the facilitation of compounds and DC’s segregations. Two non-Bayesian mixture clustering approaches, the Gaussian mixture method (GMM via Rmixmod) and the Hidden Markov model (HMM via depmixS4) as applied in this thesis permit capture of the global properties of molecules with related targets. Based on these mixture approaches, this study is identified disease targets using the score function and molecular physicochemical properties of drugs-towards target. All mixture clustering models are identified 9 poor/non-druggable and 26 good/druggable targets with the anti-bacterial and adrenergic targets identified as the topmost poor and good druggable target respectively. Furthermore, three popular machine learning (ML) methods, such as (1) recursive partitioning, (2) naïve Bayesian and (3) support vector machine technique was also used to discriminate drug-like and non grug-like molecules based on molecular descriptors. Among these ML techniques, the SVM model is superior in terms of different rule-based drugs classifications and achieved a sensitivity range of 94% to 99% and specificity range of 84% to 100%, likewise exhibiting higher C values 0.68 to 0.99. The three-mixture based clustering with classification analyses results which use both LogD and logP are offering an excellent opportunity to consider these lipophilicity measurement descriptors (logP and logD) in conjunction with other descriptors to help predict permeability and solubility of active compounds in drug discovery. This study has the potential to significantly reduce the false classification of drugs and suggest an appropriate predictor set to help identify for new drug innovations.
Γεωργιάδης, Γιώργος. "Η παράμετρος της κεντρικότητας σε ανεξάρτητα κλίμακας μεγάλα δίκτυα." 2006. http://nemertes.lis.upatras.gr/jspui/handle/10889/137.
Full textA trend in recent years is the study of large networks which possess a hierarchical structure independent of the current scale (large scale-free networks). A traditional method of network modelling is the use of graphs and the usage of results based on Graph Theory. Until recently, the classical models studied, describe the probability of two random vertices connecting with each other as equal for all pairs of vertices. This modelling fails to describe many everyday networks such as acquaintance networks, where the vertices are individuals and connect with an edge if they know each other
Liang, Chieh-Hsiang, and 梁捷翔. "Extreme Clustering Coefficients In High Edge Density Networks." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/31972955354085060082.
Full text淡江大學
資訊工程學系資訊網路與通訊碩士班
98
This paper proposed two models with extreme average clustering coefficients and small path length properties for high edge density network. High density networks are common in the analysis of social networks and biological networks. This paper studies networks with extreme statistical properties, that is, max/min clustering coefficients and short average distances. In addition to those properties, the proposed models indicated that in addition to the existing small-world network model and random network model, there are other network models that may produce clustering coefficients filling the gap between those two models and the maximal achievable clustering coefficients.
Lee, Che-Chun, and 李哲均. "Finding Overlapping Communities by Local Clustering Coefficients of Seed Nodes." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/m37pq3.
Full textHuang, Shi-Yu, and 黃士育. "Overlapping Community Discovery by Combining Local Clustering Coefficients and Neighbor Relationship Measurements." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/r4tqjq.
Full text樹德科技大學
資訊工程系碩士班
105
Most users of online social networks play different roles at different times due to the diversity of their interests. Overlapping community discovery studies the complexity involved in interpersonal social networks, using various techniques of Social Network Analysis (SNA). SNA identifies seed nodes of social networks, based on which hidden overlapping communities could be found by gradually merging neighboring seeds to form large groups. In methods that select nodes of high degrees only, close-knit groups consisting of nodes of low degrees are often neglected. To overcome the problem, this study proposes to select nodes of high Local Clustering Coefficients (LCC) as seeds and then examine the relationship degrees between neighboring seeds to discover overlapping communities. The proposed method was compared with those adopting nodes of high degrees as seeds, as well as the famous Clique Percolation Method (CPM). The result showed effective improvement in grouping quality and graph efficiency.
Li, Zheng-Kuan, and 李政寬. "Applying Regression Coefficients Clustering in Multivariate Time Series Transforming for 3D Convolutional Neural Networks." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/cpv42m.
Full text國立臺灣科技大學
工業管理系
107
Multivariate time series data is very common in real life. Since most problems not only consider a single variable, but also multiple variables affect the label, how to effectively solve the problem of multivariate time series classification remain a major problem in research. In recent years, with the rapid development of Artificial Intelligence (AI), the deep learning framework has been tried to deal with multivariate time series classification problems. This study proposes a method to solve the problem of MTS classification. The multivariate time series data is used to find the regression equation by regression analysis. We use the regression coefficient and intercept to the cluster so that the time series with similar trends are divided into the same cluster, and the literature proposes to the four frameworks to encode time series data as different types of images. According to the clustering results, the time series with similar trends will be used the same method to encode time series into images and try a variety of experiment to determine encoding method for each cluster of time series. After encoding multivariate time series data as images according to the above method, each data is input into the 3D convolutional neural networks for feature extraction and image recognition, which can effectively solve the multivariate time series classification problem and find the best classification accuracy.
Shiping, Liu. "Synthetic notions of curvature and applications in graph theory." Doctoral thesis, 2012. https://ul.qucosa.de/id/qucosa%3A11816.
Full textΔαγκλής, Οδυσσέας. "Ζητήματα μοντελοποίησης και προσέγγισης του χρωματικού αριθμού σε scale-free δίκτυα." Thesis, 2009. http://nemertes.lis.upatras.gr/jspui/handle/10889/2104.
Full textNetworks that exhibit a certain quality irrespective of their size and density are called scale-free. In many real-life networks this quality coincides with a power-law distribution of the nodes' degree with exponent ranging in [2..4]. This work presents three static models for constructing scale-free networks, based on the dynamic Barabási-Albert model, and attempts to experimentally approximate their chromatic number.
(10725786), James Michael Amstutz. "Cluster-Based Analysis Of Retinitis Pigmentosa Candidate Modifiers Using Drosophila Eye Size And Gene Expression Data." Thesis, 2021.
Find full textThe goal of this thesis is to algorithmically identify candidate modifiers for retinitis pigmentosa (RP) to help improve therapy and predictions for this genetic disorder that may lead to a complete loss of vision. A current research by (Chow et al., 2016) focused on the genetic contributors to RP by trying to recognize a correlation between genetic modifiers and phenotypic variation in female Drosophila melanogaster, or fruit flies. In comparison to the genome-wide association analysis carried out in Chow et al.’s research, this study proposes using a K-Means clustering algorithm on RNA expression data to better understand which genes best exhibit characteristics of the RP degenerative model. Validating this algorithm’s effectiveness in identifying suspected genes takes priority over their classification.
This study investigates the linear relationship between Drosophila eye size and genetic expression to gather statistically significant, strongly correlated genes from the clusters with abnormally high or low eye sizes. The clustering algorithm is implemented in the R scripting language, and supplemental information details the steps of this computational process. Running the mean eye size and genetic expression data of 18,140 female Drosophila genes and 171 strains through the proposed algorithm in its four variations helped identify 140 suspected candidate modifiers for retinal degeneration. Although none of the top candidate genes found in this study matched Chow’s candidates, they were all statistically significant and strongly correlated, with several showing links to RP. These results may continue to improve as more of the 140 suspected genes are annotated using identical or comparative approaches.