Dissertations / Theses: 'Method of k-means'

1

Кіріченко, Л. О., В. Г. Кобзєв, and Є. Д. Федоренко. "Data Mining methods for detection of collective anomalies in time series." Thesis, Національна академія Національної гвардії України, 2021. https://openarchive.nure.ua/handle/document/16449.

Full text

Abstract:

The paper considers the approach to the detection of collective anomalies in time series, based on the use of clustering methods, in particular the method of k-means, as well as the effectiveness of their application.

APA, Harvard, Vancouver, ISO, and other styles

2

Hudson, Cody Landon. "Protein structure analysis and prediction utilizing the Fuzzy Greedy K-means Decision Forest model and Hierarchically-Clustered Hidden Markov Models method." Thesis, University of Central Arkansas, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=1549796.

Full text

Abstract:

Structural genomics is a field of study that strives to derive and analyze the structural characteristics of proteins through means of experimentation and prediction using software and other automatic processes. Alongside implications for more effective drug design, the main motivation for structural genomics concerns the elucidation of each protein’s function, given that the structure of a protein almost completely governs its function. Historically, the approach to derive the structure of a protein has been through exceedingly expensive, complex, and time consuming methods such as x-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy.

In response to the inadequacies of these methods, three families of approaches developed in a relatively new branch of computer science known as bioinformatics. The aforementioned families include threading, homology-modeling, and the de novo approach. However, even these methods fail either due to impracticalities, the inability to produce novel folds, rampant complexity, inherent limitations, etc. In their stead, this work proposes the Fuzzy Greedy K-means Decision Forest model, which utilizes sequence motifs that transcend protein family boundaries to predict local tertiary structure, such that the method is cheap, effective, and can produce semi-novel folds due to its local (rather than global) prediction mechanism. This work further extends the FGK-DF model with a new algorithm, the Hierarchically Clustered-Hidden Markov Models (HC-HMM) method to extract protein primary sequence motifs in a more accurate and adequate manner than currently exhibited by the FGK-DF model, allowing for more accurate and powerful local tertiary structure predictions. Both algorithms are critically examined, their methodology thoroughly explained and tested against a consistent data set, the results thereof discussed at length.

APA, Harvard, Vancouver, ISO, and other styles

3

Ruzgys, Martynas. "IT žinių portalo statistikos modulis pagrįstas grupavimu." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2007. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2007~D_20070816_143545-16583.

Full text

Abstract:

Pristatomas duomenų gavybos ir grupavimo naudojimas paplitusiose sistemose bei sukurtas IT žinių portalo statistikos prototipas duomenų saugojimui, analizei ir peržiūrai atlikti. Siūlomas statistikos modulis duomenų saugykloje periodiškais laiko momentais vykdantis duomenų transformacijas. Portale prieinami statistiniai duomenys gali būti grupuoti. Sugrupuotą informaciją pateikus grafiškai, duomenys gali būti interpretuojami ir stebimi veiklos mastai. Panašių objektų grupėms išskirti pritaikytas vienas iš žinomiausių duomenų grupavimo metodų – lygiagretusis k-vidurkių metodas.
Presented data mining methods and clustering usage in current statistical systems and created statistics module prototype for data storage, analysis and visualization for IT knowledge portal. In suggested statistics prototype database periodical data transformations are performed. Statistical data accessed in portal can be clustered. Clustered information represented graphically may serve for interpreting information when trends may be noticed. One of the best known data clustering methods – parallel k-means method – is adapted for separating similar data clusters.

APA, Harvard, Vancouver, ISO, and other styles

4

紘幸, 児玉, and Hiroyuki Kodama. "工具カタログからのデータマイニングに支援されたものづくりシステムに関する研究." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB12863871/?lang=0, 2014. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB12863871/?lang=0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Žambochová, Marta. "Shluková analýza rozsáhlých souborů dat: nové postupy založené na metodě k-průměrů." Doctoral thesis, Vysoká škola ekonomická v Praze, 2005. http://www.nusl.cz/ntk/nusl-77061.

Full text

Abstract:

Abstract Cluster analysis has become one of the main tools used in extracting knowledge from data, which is known as data mining. In this area of data analysis, data of large dimensions are often processed, both in the number of objects and in the number of variables, which characterize the objects. Many methods for data clustering have been developed. One of the most widely used is a k-means method, which is suitable for clustering data sets containing large number of objects. It is based on finding the best clustering in relation to the initial distribution of objects into clusters and subsequent step-by-step redistribution of objects belonging to the clusters by the optimization function. The aim of this Ph.D. thesis was a comparison of selected variants of existing k-means methods, detailed characterization of their positive and negative characte- ristics, new alternatives of this method and experimental comparisons with existing approaches. These objectives were met. I focused on modifications of the k-means method for clustering of large number of objects in my work, specifically on the algorithms BIRCH k-means, filtering, k-means++ and two-phases. I watched the time complexity of algorithms, the effect of initialization distribution and outliers, the validity of the resulting clusters. Two real data files and some generated data sets were used. The common and different features of method, which are under investigation, are summarized at the end of the work. The main aim and benefit of the work is to devise my modifications, solving the bottlenecks of the basic procedure and of the existing variants, their programming and verification. Some modifications brought accelerate the processing. The application of the main ideas of algorithm k-means++ brought to other variants of k-means method better results of clustering. The most significant of the proposed changes is a modification of the filtering algorithm, which brings an entirely new feature of the algorithm, which is the detection of outliers. The accompanying CD is enclosed. It includes the source code of programs written in MATLAB development environment. Programs were created specifically for the purpose of this work and are intended for experimental use. The CD also contains the data files used for various experiments.

APA, Harvard, Vancouver, ISO, and other styles

6

Kondapalli, Swetha. "An Approach To Cluster And Benchmark Regional Emergency Medical Service Agencies." Wright State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1596491788206805.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Gunay, Melih. "Representation Of Covariance Matrices In Track Fusion Problems." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12609026/index.pdf.

Full text

Abstract:

Covariance Matrix in target tracking algorithms has a critical role at multi- sensor track fusion systems. This matrix reveals the uncertainty of state es- timates that are obtained from diferent sensors. So, many subproblems of track fusion usually utilize this matrix to get more accurate results. That is why this matrix should be interchanged between the nodes of the multi-sensor tracking system. This thesis mainly deals with analysis of approximations of the covariance matrix that can best represent this matrix in order to efectively transmit this matrix to the demanding site. Kullback-Leibler (KL) Distance is exploited to derive some of the representations for Gaussian case. Also com- parison of these representations is another objective of this work and this is based on the fusion performance of the representations and the performance is measured for a system of a 2-radar track fusion system.

APA, Harvard, Vancouver, ISO, and other styles

8

Abbasian, Houman. "Inner Ensembles: Using Ensemble Methods in Learning Step." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31127.

Full text

Abstract:

A pivotal moment in machine learning research was the creation of an important new research area, known as Ensemble Learning. In this work, we argue that ensembles are a very general concept, and though they have been widely used, they can be applied in more situations than they have been to date. Rather than using them only to combine the output of an algorithm, we can apply them to decisions made inside the algorithm itself, during the learning step. We call this approach Inner Ensembles. The motivation to develop Inner Ensembles was the opportunity to produce models with the similar advantages as regular ensembles, accuracy and stability for example, plus additional advantages such as comprehensibility, simplicity, rapid classification and small memory footprint. The main contribution of this work is to demonstrate how broadly this idea can be applied, and highlight its potential impact on all types of algorithms. To support our claim, we first provide a general guideline for applying Inner Ensembles to different algorithms. Then, using this framework, we apply them to two categories of learning methods: supervised and un-supervised. For the former we chose Bayesian network, and for the latter K-Means clustering. Our results show that 1) the overall performance of Inner Ensembles is significantly better than the original methods, and 2) Inner Ensembles provide similar performance improvements as regular ensembles.

APA, Harvard, Vancouver, ISO, and other styles

9

Sarazin, Marianne. "Elaboration d'un score de vieillissement : propositions théoriques." Phd thesis, Université Jean Monnet - Saint-Etienne, 2013. http://tel.archives-ouvertes.fr/tel-00994941.

Full text

Abstract:

Le vieillissement fait actuellement l'objet de toutes les attentions, constituant en effet un problème de santé publique majeur. Sa description reste cependant complexe en raison des intrications à la fois individuelles et collectives de sa conceptualisation et d'une dimension subjective forte. Les professionnels de santé sont de plus en plus obligés d'intégrer cette donnée dans leur réflexion et de proposer des protocoles de prise en charge adaptés. Le vieillissement est une évolution inéluctable du corps dont la quantification est établie par l'âge dépendant du temps dit " chronologique ". Ce critère âge est cependant imparfait pour mesurer l'usure réelle du corps soumise à de nombreux facteurs modificateurs dépendant des individus. Aussi, partant de réflexions déjà engagées et consistant à substituer cet âge chronologique par un critère composite appelé " âge biologique ", aboutissant à la création d'un indicateur ou score de vieillissement et sensé davantage refléter le vieillissement individuel, une nouvelle méthodologie est proposée adaptée à la pratique de médecine générale. Une première phase de ce travail a consisté à sonder les médecins généralistes sur leur perception et leur utilisation des scores cliniques en pratique courante par l'intermédiaire d'une enquête qualitative et quantitative effectuée en France métropolitaine. Cette étude a montré que l'adéquation entre l'utilisation déclarée et la conception intellectualisée des scores restait dissociée. Les scores constituent un outil d'aide à la prise en charge utile pour cibler une approche systémique souvent complexe dans la mesure où ils sont simples à utiliser (peu d'items et items adaptés à la pratique) et à la validité scientifiquement comprise par le médecin. Par ailleurs, l'âge du patient a été cité comme un élément prépondérant influençant le choix adéquat du score par le médecin généraliste. Cette base de travail a donc servi à proposer une modélisation de l'âge biologique dont la réflexion a porté tant sur le choix du modèle mathématique que des variables constitutives de ce modèle. Une sélection de variables marqueurs du vieillissement a été effectuée à partir d'une revue de la littérature et tenant compte de leur possible intégration dans le processus de soin en médecine générale. Cette sélection a été consolidée par une approche mathématique selon un processus de sélection ascendant à partir d'un modèle régressif. Une population dite " témoin " au vieillissement considéré comme normal a été ensuite constituée servant de base comparative au calcul de l'âge biologique. Son choix a été influencé dans un premier temps par les données de la littérature puis secondairement selon un tri par classification utilisant la méthode des nuées dynamiques. Un modèle de régression linéaire simple a ensuite été construit mais avec de données normalisées selon la méthode des copules gaussiennes suivi d'une étude des queues de distribution marginales. Les résultats ainsi obtenus laissent entrevoir des perspectives intéressantes de réflexion pour approfondir le calcul d'un âge biologique et du score en découlant en médecine générale, sa validation par une étude de morbidité constituant l'étape ultime de ce travail

APA, Harvard, Vancouver, ISO, and other styles

10

Ramler, Ivan Peter. "Improved statistical methods for k-means clustering of noisy and directional data." [Ames, Iowa : Iowa State University], 2008.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

11

Mayer-Jochimsen, Morgan. "Clustering Methods and Their Applications to Adolescent Healthcare Data." Scholarship @ Claremont, 2013. http://scholarship.claremont.edu/scripps_theses/297.

Full text

Abstract:

Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data.

APA, Harvard, Vancouver, ISO, and other styles

12

Hinz, Joel. "Clustering the Web : Comparing Clustering Methods in Swedish." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-95228.

Full text

Abstract:

Clustering -- automatically sorting -- web search results has been the focus of much attention but is by no means a solved problem, and there is little previous work in Swedish. This thesis studies the performance of three clustering algorithms -- k-means, agglomerative hierarchical clustering, and bisecting k-means -- on a total of 32 corpora, as well as whether clustering web search previews, called snippets, instead of full texts can achieve reasonably decent results. Four internal evaluation metrics are used to assess the data. Results indicate that k-means performs worse than the other two algorithms, and that snippets may be good enough to use in an actual product, although there is ample opportunity for further research on both issues; however, results are inconclusive regarding bisecting k-means vis-à-vis agglomerative hierarchical clustering. Stop word and stemmer usage results are not significant, and appear to not affect the clustering by any considerable magnitude.

APA, Harvard, Vancouver, ISO, and other styles

13

Ganey, Raeesa. "Principal points, principal curves and principal surfaces." Master's thesis, University of Cape Town, 2015. http://hdl.handle.net/11427/15515.

Full text

Abstract:

The idea of approximating a distribution is a prominent problem in statistics. This dissertation explores the theory of principal points and principal curves as approximation methods to a distribution. Principal points of a distribution have been initially introduced by Flury (1990) who tackled the problem of optimal grouping in multivariate data. In essence, principal points are the theoretical counterparts of cluster means obtained by the k-means algorithm. Principal curves defined by Hastie (1984), are smooth one-dimensional curves that pass through the middle of a p-dimensional data set, providing a nonlinear summary of the data. In this dissertation, details on the usefulness of principal points and principal curves are reviewed. The application of principal points and principal curves are then extended beyond its original purpose to well-known computational methods like Support Vector Machines in machine learning.

APA, Harvard, Vancouver, ISO, and other styles

14

Vaněčková, Tereza. "Numerické metody pro klasifikaci metagenomických dat." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-242014.

Full text

Abstract:

This thesis deals with metagenomics and numerical methods for classification of metagenomic data. Review of alignment-free methods based on nucleotide word frequency is provided as they appear to be effective for processing of metagenomic sequence reads produced by next-generation sequencing technologies. To evaluate these methods, selected features based on k-mer analysis were tested on simulated dataset of metagenomic sequence reads. Then the data in original data space were enrolled for hierarchical clustering and PCA processed data were clustered by K-means algorithm. Analysis was performed for different lengths of nucleotide words and evaluated in terms of classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

15

Baccherini, Simona. "Pattern recognition methods for EMG prosthetic control." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12033/.

Full text

Abstract:

In this work we focus on pattern recognition methods related to EMG upper-limb prosthetic control. After giving a detailed review of the most widely used classification methods, we propose a new classification approach. It comes as a result of comparison in the Fourier analysis between able-bodied and trans-radial amputee subjects. We thus suggest a different classification method which considers each surface electrodes contribute separately, together with five time domain features, obtaining an average classification accuracy equals to 75% on a sample of trans-radial amputees. We propose an automatic feature selection procedure as a minimization problem in order to improve the method and its robustness.

APA, Harvard, Vancouver, ISO, and other styles

16

Guder, Mennan. "Data Mining Methods For Clustering Power Quality Data Collected Via Monitoring Systems Installed On The Electricity Network." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/3/12611120/index.pdf.

Full text

Abstract:

Increasing power demand and wide use of high technology power electronic devices result in need for power quality monitoring. The quality of electric power in both transmission and distribution systems should be analyzed in order to sustain power system reliability and continuity. This analysis is possible by examination of data collected by power quality monitoring systems. In order to define the characteristics of the power system and reveal the relations between the power quality events, huge amount of data should be processed. In this thesis, clustering methods for power quality events are developed using exclusive and overlapping clustering models. The methods are designed to cluster huge amount of power quality data which is obtained from the online monitoring of the Turkish Electricity Transmission System. The main issues considered in the design of the clustering methods are the amount of the data, efficiency of the designed algorithm and queries that should be supplied to the domain experts. This research work is fully supported by the Public Research grant Committee (KAMAG) of TUBITAK within the scope of National Power quality Project (105G129).

APA, Harvard, Vancouver, ISO, and other styles

17

Liu, Yating. "Optimal Quantization : Limit Theorem, Clustering and Simulation of the McKean-Vlasov Equation." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS215.

Full text

Abstract:

Cette thèse contient deux parties. Dans la première partie, on démontre deux théorèmes limites de la quantification optimale. Le premier théorème limite est la caractérisation de la convergence sous la distance de Wasserstein d’une suite de mesures de probabilité par la convergence simple des fonctions d’erreur de la quantification. Ces résultats sont établis en Rd et également dans un espace de Hilbert séparable. Le second théorème limite montre la vitesse de convergence des grilles optimales et la performance de quantification pour une suite de mesures de probabilité qui convergent sous la distance de Wasserstein, notamment la mesure empirique. La deuxième partie de cette thèse se concentre sur l’approximation et la simulation de l’équation de McKean-Vlasov. On commence cette partie par prouver, par la méthode de Feyel (voir Bouleau (1988)[Section 7]), l’existence et l’unicité d’une solution forte de l’équation de McKean-Vlasov dXt = b(t, Xt, μt)dt + σ(t, Xt, μt)dBt sous la condition que les fonctions de coefficient b et σ sont lipschitziennes. Ensuite, on établit la vitesse de convergence du schéma d’Euler théorique de l’équation de McKean-Vlasov et également les résultats de l’ordre convexe fonctionnel pour les équations de McKean-Vlasov avec b(t,x,μ) = αx+β, α,β ∈ R. Dans le dernier chapitre, on analyse l’erreur de la méthode de particule, de plusieurs schémas basés sur la quantification et d’un schéma hybride particule- quantification. À la fin, on illustre deux exemples de simulations: l’équation de Burgers (Bossy and Talay (1997)) en dimension 1 et le réseau de neurones de FitzHugh-Nagumo (Baladron et al. (2012)) en dimension 3
This thesis contains two parts. The first part addresses two limit theorems related to optimal quantization. The first limit theorem is the characterization of the convergence in the Wasserstein distance of probability measures by the pointwise convergence of Lp-quantization error functions on Rd and on a separable Hilbert space. The second limit theorem is the convergence rate of the optimal quantizer and the clustering performance for a probability measure sequence (μn)n∈N∗ on Rd converging in the Wasserstein distance, especially when (μn)n∈N∗ are the empirical measures with finite second moment but possibly unbounded support. The second part of this manuscript is devoted to the approximation and the simulation of the McKean-Vlasov equation, including several quantization based schemes and a hybrid particle-quantization scheme. We first give a proof of the existence and uniqueness of a strong solution of the McKean- Vlasov equation dXt = b(t, Xt, μt)dt + σ(t, Xt, μt)dBt under the Lipschitz coefficient condition by using Feyel’s method (see Bouleau (1988)[Section 7]). Then, we establish the convergence rate of the “theoretical” Euler scheme and as an application, we establish functional convex order results for scaled McKean-Vlasov equations with an affine drift. In the last chapter, we prove the convergence rate of the particle method, several quantization based schemes and the hybrid scheme. Finally, we simulate two examples: the Burger’s equation (Bossy and Talay (1997)) in one dimensional setting and the Network of FitzHugh-Nagumo neurons (Baladron et al. (2012)) in dimension 3

APA, Harvard, Vancouver, ISO, and other styles

18

Thorstensson, Linnea. "Clustering Methods as a Recruitment Tool for Smaller Companies." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273571.

Full text

Abstract:

With the help of new technology it has become much easier to apply for a job. Reaching out to a larger audience also results in a lot of more applications to consider when hiring for a new position. This has resulted in that many big companies uses statistical learning methods as a tool in the first step of the recruiting process. Smaller companies that do not have access to the same amount of historical and big data sets do not have the same opportunities to digitalise their recruitment process. Using topological data analysis, this thesis explore how clustering methods can be used on smaller data sets in the early stages of the recruitment process. It also studies how the level of abstraction in data representation affects the results. The methods seem to perform well on higher level job announcements but struggles on basic level positions. It also shows that the representation of candidates and jobs has a huge impact on the results.
Ny teknologi har förenklat processen för att söka arbete. Detta har resulterat i att företag får tusentals ansökningar som de måste ta hänsyn till. För att förenkla och påskynda rekryteringsprocessen har många stora företag börjat använda sig av maskininlärningsmetoder. Mindre företag, till exempel start-ups, har inte samma möjligheter för att digitalisera deras rekrytering. De har oftast inte tillgång till stora mängder historisk ansökningsdata. Den här uppsatsen undersöker därför med hjälp av topologisk dataanalys hur klustermetoder kan användas i rekrytering på mindre datauppsättningar. Den analyserar också hur abstraktionsnivån på datan påverkar resultaten. Metoderna visar sig fungera bra för jobbpositioner av högre nivå men har problem med jobb på en lägre nivå. Det visar sig också att valet av representation av kandidater och jobb har en stor inverkan på resultaten.

APA, Harvard, Vancouver, ISO, and other styles

19

Yan, Mingjin. "Methods of Determining the Number of Clusters in a Data Set and a New Clustering Criterion." Diss., Virginia Tech, 2005. http://hdl.handle.net/10919/29957.

Full text

Abstract:

In cluster analysis, a fundamental problem is to determine the best estimate of the number of clusters, which has a deterministic effect on the clustering results. However, a limitation in current applications is that no convincingly acceptable solution to the best-number-of-clusters problem is available due to high complexity of real data sets. In this dissertation, we tackle this problem of estimating the number of clusters, which is particularly oriented at processing very complicated data which may contain multiple types of cluster structure. Two new methods of choosing the number of clusters are proposed which have been shown empirically to be highly effective given clear and distinct cluster structure in a data set. In addition, we propose a sequential type of clustering approach, called multi-layer clustering, by combining these two methods. Multi-layer clustering not only functions as an efficient method of estimating the number of clusters, but also, by superimposing a sequential idea, improves the flexibility and effectiveness of any arbitrary existing one-layer clustering method. Empirical studies have shown that multi-layer clustering has higher efficiency than one layer clustering approaches, especially in detecting clusters in complicated data sets. The multi-layer clustering approach has been successfully implemented in clustering the WTCHP microarray data and the results can be interpreted very well based on known biological knowledge. Choosing an appropriate clustering method is another critical step in clustering. K-means clustering is one of the most popular clustering techniques used in practice. However, the k-means method tends to generate clusters containing a nearly equal number of objects, which is referred to as the ``equal-size'' problem. We propose a clustering method which competes with the k-means method. Our newly defined method is aimed at overcoming the so-called ``equal-size'' problem associated with the k-means method, while maintaining its advantage of computational simplicity. Advantages of the proposed method over k-means clustering have been demonstrated empirically using simulated data with low dimensionality.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

20

Yoldas, Mine. "Predicting The Effect Of Hydrophobicity Surface On Binding Affinity Of Pcp-like Compounds Using Machine Learning Methods." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613215/index.pdf.

Full text

Abstract:

This study aims to predict the binding affinity of the PCP-like compounds by means of molecular hydrophobicity. Molecular hydrophobicity is an important property which affects the binding affinity of molecules. The values of molecular hydrophobicity of molecules are obtained on three-dimensional coordinate system. Our aim is to reduce the number of points on the hydrophobicity surface of the molecules. This is modeled by using self organizing maps (SOM) and k-means clustering. The feature sets obtained from SOM and k-means clustering are used in order to predict binding affinity of molecules individually. Support vector regression and partial least squares regression are used for prediction.

APA, Harvard, Vancouver, ISO, and other styles

21

Czudek, Marek. "Detekce síťových anomálií na základě NetFlow dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-235461.

Full text

Abstract:

This thesis describes the use of NetFlow data in the systems for detection of disruptions or anomalies in computer network traffic. Various methods for network data collection are described, focusing especially on the NetFlow protocol. Further, various methods for anomaly detection in network traffic are discussed and evaluated, and their advantages as well as disadvantages are listed. Based on this analysis one method is chosen. Further, test data set is analyzed using the method. Algorithm for real-time network traffic anomaly detection is designed based on the analysis outcomes. This method was chosen mainly because it enables detection of anomalies even in an unlabelled network traffic. The last part of the thesis describes implementation of the algorithm, as well as experiments performed using the resulting application on real NetFlow data.

APA, Harvard, Vancouver, ISO, and other styles

22

Mohiddin, Syed B. "Development of novel unsupervised and supervised informatics methods for drug discovery applications." The Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1138385657.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Evans, Jr Richard Austin. "Fostering success in reading: a survey of teaching methods and collaboration practices of high performing elementary schools in Texas." Texas A&M University, 2002. http://hdl.handle.net/1969.1/3968.

Full text

Abstract:

This study examined reading programs in 68 Texas elementary schools that were identified as successful by their scores on TAAS assessment results in the 1999-2000 school year. These schoolsÂ student populations had a high proportion of culturally diverse and low-SES students. The purposes of this study were: (1) to determine if and how teaching methods and collaboration (intervention/support teams) were used by effective schools to foster reading success in all students; (2) to identify cohesive patterns (clusters) or models in schoolsÂ use of collaboration and teaching methods; (3) to examine these clusters of similar schools and see if the patterns differed based on the school/community demography (urban, suburban, or rural). The study was conducted in 68 schools in 33 school districts that represented various demographic settings from 12 different Education Service Centers across Texas. From these original 332 variables, 26 variables were selected that were of medium frequency and strongly correlated with high TAAS scores over a 4- year period. These 26 variables were used to examine the 68 high-performing Texas elementary schools for clusters. K-means analysis and HCA were both applied to the 26 response variables, using them as complementary techniques to arrive at a five cluster solution. Results from correlations of individual characteristics and from identifying school clusters suggested that school community type could possibly be moderately predictive of student performance on the TAAS/TAKS over time.

APA, Harvard, Vancouver, ISO, and other styles

24

Hunter, Brandon. "Channel Probing for an Indoor Wireless Communications Channel." BYU ScholarsArchive, 2003. https://scholarsarchive.byu.edu/etd/64.

Full text

Abstract:

The statistics of the amplitude, time and angle of arrival of multipaths in an indoor environment are all necessary components of multipath models used to simulate the performance of spatial diversity in receive antenna configurations. The model presented by Saleh and Valenzuela, was added to by Spencer et. al., and included all three of these parameters for a 7 GHz channel. A system was built to measure these multipath parameters at 2.4 GHz for multiple locations in an indoor environment. Another system was built to measure the angle of transmission for a 6 GHz channel. The addition of this parameter allows spatial diversity at the transmitter along with the receiver to be simulated. The process of going from raw measurement data to discrete arrivals and then to clustered arrivals is analyzed. Many possible errors associated with discrete arrival processing are discussed along with possible solutions. Four clustering methods are compared and their relative strengths and weaknesses are pointed out. The effects that errors in the clustering process have on parameter estimation and model performance are also simulated.

APA, Harvard, Vancouver, ISO, and other styles

25

Pettersson, Christoffer. "Investigating the Correlation Between Marketing Emails and Receivers Using Unsupervised Machine Learning on Limited Data : A comprehensive study using state of the art methods for text clustering and natural language processing." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189147.

Full text

Abstract:

The goal of this project is to investigate any correlation between marketing emails and their receivers using machine learning and only a limited amount of initial data. The data consists of roughly 1200 emails and 98.000 receivers of these. Initially, the emails are grouped together based on their content using text clustering. They contain no information regarding prior labeling or categorization which creates a need for an unsupervised learning approach using solely the raw text based content as data. The project investigates state-of-the-art concepts like bag-of-words for calculating term importance and the gap statistic for determining an optimal number of clusters. The data is vectorized using term frequency - inverse document frequency to determine the importance of terms relative to the document and to all documents combined. An inherit problem of this approach is high dimensionality which is reduced using latent semantic analysis in conjunction with singular value decomposition. Once the resulting clusters have been obtained, the most frequently occurring terms for each cluster are analyzed and compared. Due to the absence of initial labeling an alternative approach is required to evaluate the clusters validity. To do this, the receivers of all emails in each cluster who actively opened an email is collected and investigated. Each receiver have different attributes regarding their purpose of using the service and some personal information. Once gathered and analyzed, conclusions could be drawn that it is possible to find distinguishable connections between the resulting email clusters and their receivers but to a limited extent. The receivers from the same cluster did show similar attributes as each other which were distinguishable from the receivers of other clusters. Hence, the resulting email clusters and their receivers are specific enough to distinguish themselves from each other but too general to handle more detailed information. With more data, this could become a useful tool for determining which users of a service should receive a particular email to increase the conversion rate and thereby reach out to more relevant people based on previous trends.
Målet med detta projekt att undersöka eventuella samband mellan marknadsföringsemail och dess mottagare med hjälp av oövervakad maskininlärning på en brgränsad mängd data. Datan består av ca 1200 email meddelanden med 98.000 mottagare. Initialt så gruperas alla meddelanden baserat på innehåll via text klustering. Meddelandena innehåller ingen information angående tidigare gruppering eller kategorisering vilket skapar ett behov för ett oövervakat tillvägagångssätt för inlärning där enbart det råa textbaserade meddelandet används som indata. Projektet undersöker moderna tekniker så som bag-of-words för att avgöra termers relevans och the gap statistic för att finna ett optimalt antal kluster. Datan vektoriseras med hjälp av term frequency - inverse document frequency för att avgöra relevansen av termer relativt dokumentet samt alla dokument kombinerat. Ett fundamentalt problem som uppstår via detta tillvägagångssätt är hög dimensionalitet, vilket reduceras med latent semantic analysis tillsammans med singular value decomposition. Då alla kluster har erhållits så analyseras de mest förekommande termerna i vardera kluster och jämförs. Eftersom en initial kategorisering av meddelandena saknas så krävs ett alternativt tillvägagångssätt för evaluering av klustrens validitet. För att göra detta så hämtas och analyseras alla mottagare för vardera kluster som öppnat något av dess meddelanden. Mottagarna har olika attribut angående deras syfte med att använda produkten samt personlig information. När de har hämtats och undersökts kan slutsatser dras kring hurvida samband kan hittas. Det finns ett klart samband mellan vardera kluster och dess mottagare, men till viss utsträckning. Mottagarna från samma kluster visade likartade attribut som var urskiljbara gentemot mottagare från andra kluster. Därav kan det sägas att de resulterande klustren samt dess mottagare är specifika nog att urskilja sig från varandra men för generella för att kunna handera mer detaljerad information. Med mer data kan detta bli ett användbart verktyg för att bestämma mottagare av specifika emailutskick för att på sikt kunna öka öppningsfrekvensen och därmed nå ut till mer relevanta mottagare baserat på tidigare resultat.

APA, Harvard, Vancouver, ISO, and other styles

26

Zheng, Sheng-Wen, and 鄭勝文. "Initialization of K-means using the mountain method." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/39186636901271105938.

Full text

Abstract:

碩士
中原大學
應用數學研究所
98
When we analyze sets of data, these data may be big and complicated so that we need to use some techniques to process these data sets. For example, reduce dimensions of data, cut down memory space of computation or do compression for data. After we make these processes, we can have data reduction with important information from data and then we can make new applications. In this paper, we use the mountain method to strength a choice of initials of cluster centers in the K-means algorithm [3]. Based on these good choices of initial of cluster centers, enhance the effectiveness of the K-means algorithm. In statistics, cluster analysis [1-2] can be roughly divided into two methods. One is hierarchical clustering method. Another one is partitional clustering method. We consider the K-means algorithm [4-6] of partitional clustering in this paper. Because the K-means algorithm need to set initial values before implement. For example, cluster number and initials of cluster centers. These different initial settings may have rather different results. For solving these problems, we consider mountain method [7-9] to process. We can get two results. One is to suggest its cluster number. Another one is to approximate initials of cluster centers of the K-means algorithm. We can have two benefits using the mountain method, such as：(1) Make the K-means algorithm spend less time when the implement is convergent; (2) The final clustering results are more accurate.

APA, Harvard, Vancouver, ISO, and other styles

27

Tsai, Wen-Bin, and 蔡文彬. "An Improved Initialization Method for the K-means Algorithm." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/65590580917571974843.

Full text

Abstract:

碩士
國立東華大學
企業管理學系
94
Abstract Clustering is one of the most basic and popular technique of Data Mining. The fundamental purpose of clustering is to partition a given disordered dataset into several clusters, so that data of the same cluster are similar but data in the different clusters are dissimilar. Among numerous techniques of clustering, the K-means algorithm is one of the most widely used technique due to the outstanding efficiency and simple concept of the K-means algorithm. However, there are some problems exist in the K-means algorithm. First, the random initialization method influences the stability and correctness of the K-means algorithm. Second, the parameter need users to decide may not be completed properly. Third, the K-means algorithm can not detect noisy data. Owing to above problems, this study proposes an improved method which is named Improved K-Means (IKM) to modify the K-means algorithm. IKM algorithm makes use of the concepts including density, grid, and statistic. After compared the simulation data of IKM with K-means, we demonstrate that the stability and correctness of IKM do better than the K-means algorithm. For the case of complicated distribution of data, the performance of IKM is better than K-means’. Moreover, IKM can automatically decide the number of clusters properly and is able to detect noisy data. Besides, in large database, the efficiency of IKM will not worse than the K-means algorithm.

APA, Harvard, Vancouver, ISO, and other styles

28

Yu, Qiao, and 于喬. "Accelerated K-means Algorithm Based on Efficient Filtering Method." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/8a3r65.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊工程系
107
K-means is a well-known clustering algorithm in data mining and machine learn- ing. It is widely applicable in various domains such as computer vision, market seg- mentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fis- sion-Fusion k-means accelerates k-means by efficient filtering method during the iter- ations. It can balance these expenses well between distance calculations and the fil- tering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms in the most cases. In addition, for more separated and naturally-clustered datasets, our algorithm is rela- tively faster than other accelerated k-means algorithms.

APA, Harvard, Vancouver, ISO, and other styles

29

Lee, Yian-yi, and 李建逸. "A Missing Value Estimation Model Based on the Gap Statistical Method and K-means Method." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/89333796640959589742.

Full text

Abstract:

碩士
南華大學
資訊管理學研究所
94
Data mining is a vitally important technique to unveil hidden information from a set of raw data. However, the integration of different sources of raw data usually comes along with missing values that may well be affecting the interpretation of data analysis. Such a bias effect is known as an issue of missing value of data integration. Data clustering techniques are widely deploying solutions to minimize possibilities of encountering missing values. 　　　Members of the raw data in a cluster are with similar characteristics and that will notably differ from other clusters. This feature of a data cluster is useful to derive a better similarity of data estimation model. To date, K-means method is a well known data clustering technique. However, while raw data are coming from various sources, K-means method is difficult to decide how many numbers of data cluster shall be made within. Among many approaches, the Gap statistical method is a fairly good approach to automatically estimate the number of data clusters that can compensate the shortage of K-means method. It also needs less re-iterate generations to derive better results. 　　　This study investigates into an integration of the K-means method and the Gap statistical method in order to find a generic missing value estimate model. The model will derive a most suitable estimation value which is beneficial to mine better results while holding the integration of vast number of raw data. The integration model of the study uses a database of power generation of the Taipower Company to testify its feasibility and effectiveness. The experiment results of the study show more statistical confidence than the SOM-based estimation model.

APA, Harvard, Vancouver, ISO, and other styles

30

Muralla, Sumakwel. "A method of accelerating K-Means by directed perturbation of the codevectors." 2006. http://digital.library.okstate.edu/etd/umi-okstate-1882.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Hsiou-HenKao and 高修恒. "A Study of Improving K-means Clustering Method- Based on Sample Points." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/eznf8n.

Full text

Abstract:

碩士
國立成功大學
統計學系碩博士班
101
Comparing to K-means algorithm, we constrain the cluster centers on the data points rather than the mean, so we propose K-exemplars algorithm. Based on this concept, K-exemplars algorithm can not just deal with the raw data but also the relational data. Although the cluster accuracy rate of K-exemplars method may not be better than K-means method, the difference is small. But the iteration times is less than K-means method significantly. This leads the convergence rate of K-exemplars is faster than K-means. In Iris data, the iteration times of K-means and K-exemplars are 7.22 and 4.02, respectively; K-exemplars reduces 3.2 iterations. Moreover, K-exemplars can be applied on any specified dissimilarity measure. K-means is influenced by outlier, but K-exemplars improves this problem.

APA, Harvard, Vancouver, ISO, and other styles

32

Chuang, Fei-Chieh, and 莊斐杰. "Research and Implementation of Cluster Validity Index for K-Means Clustering Method." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/55348819688464276278.

Full text

Abstract:

碩士
真理大學
資訊工程學系碩士班
102
In this paper, a cluster validity index called CDV index is presented. The CDV index is capable of providing a quality measurement for the goodness of a clustering result for a data set. This measurement of cluster quality creates a curve of quadratic function style, and the minimum value of the curve is the CDV index value, which means the best number of clusters found in the case. The CDV index is composed of three major factors, including a statistically calculated external diameter factor, a restorer factor to reduce the eﬀect of data dimension, and a number of clusters related punishment factor. With the calculation of the product of the three factors under various number of clusters settings, the best clustering result for some number of clusters setting is able to be found by searching for the minimum value of CDV curve. The best clustering result is then guaranteed to have the following characteristics: the optimal compactness of intra cluster relationship, and the optimal dispersedness of inter cluster relationship. In the impirical experiments presented in this research, K-Means clustering method is chosen for its simplicity and execution speed. For the presentation of the effectiveness and superiority of the CDV index in the experiments, several traditional cluster validity indexes were implemented as the control group of experiments, including DI, DBI, ADI, and the most effective PBM index in recent years. The data sets of the experiments are also carefully selected to justify the generalization of CDV index, including three real world data sets and three artificial data sets which are the simulation of real world data distribution. These data sets are all tested to present the superior features of CDV index.

APA, Harvard, Vancouver, ISO, and other styles

33

Lin, Y. F., and 林郁峰. "Applying K-means Clustering Method on PDM- Using woodworking machine as an example." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/60768020832697224531.

Full text

Abstract:

碩士
國立勤益科技大學
工業工程與管理系
97
Product Data Management （PDM） is software-based and product-oriented technology that realizes the centralized management for product related information, production process, and resource integration. It is a useful tool for companies, engineers, and related personnel to management information and support product R&D. During the R&D stage, if a product involves a large number of parts with complicated structure, the parts would be contracted to several manufacturers for design, and later integrated into a large part. Thus, the parts need to be considered for compatibility with other parts, in order to be assembled precisely at the factory. The role of the parts manufacturers in the design process needs a common platform, in order to review the diagrams and documents during the process, so that the R&D unit could find the parts produced by each manufacturer quickly and avoid the mistake of producing multiple parts. Also, in the traditional production line or material preparation stage, using documents or manmade mistakes would also lead to using wrong materials, thus causing customer complaints or business loss. Therefore, by using the PDM system, personnel who manage or use the materials could quickly find the right parts. Most of companies do not know how to manage their product information effectively. Therefore, it wastes the recourses and increases the cost of management for companies. This R&D can help companies to get rid of their bad routine on product management and transform it becoming the assistance for them. In order to do it, the companies can use the technology of data mining and product administration system to develop an excellent solution. During the research, the companies can rely on analysis to distinguish the importance of the useful information under design procedure. Furthermore, they can use PDM to come out with some helpful solutions to deal with problems. Base on the structure of development, capitalize for data reiteration, and devolving knowledge on product management system, the companies can integrated their internal data management, knowledge management, and data saving. The purpose of this R&D is integration of the product data management and K-means clustering. First of all, the companies utilize K-means clustering to distinguish the datum. Secondly, they can apply decision tree to analyze and search for the constructive information under strategic decision. Additionally, it becomes the useful knowledge base on analysis and inference. Finally, the companies can deeply adapt PDM to their original product data management system.

APA, Harvard, Vancouver, ISO, and other styles

34

Wang and 王雲輝. "Improve the NAND IC Sorting method via K-means and principal component analysis." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/843e3n.

Full text

Abstract:

碩士
國立中央大學
工業管理研究所
107
Since the memory products have been closely related to people lives, whether it is a mobile, computer and internet service provider, the storage device behind it has been update from hard disk drive (HDD) to solid state drive(SSD), the key point is the NAND by the this research. Therefore, in the memory product market, how to launch products in the fastest time and quality is good, this is already an important core competitiveness of each company. This study will use the K-means to group the NAND, classify the ICs with the same characteristics, so that product firmware developers can focus on the NAND and understand the characteristics of the group NAND with appropriate firmware algorithms to handler the process problem and other lot or grade. The experimental results show that the method of this study has the same characteristics in the same type and different batch of NAND, this can also be provided to the R&D and quality department for reference the product and material quality.

APA, Harvard, Vancouver, ISO, and other styles

35

Lin, Yong-Hui, and 林泳輝. "A Face Recognition Method Based on the Ant Colony Optimization and K-means Algorithm." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/92744678543355595411.

Full text

Abstract:

碩士
中原大學
電機工程研究所
101
In this thesis, we propose a face recognition method based on ant colony optimization and K-means theory. The main purpose is to hope that through this system, the face detection can be well completed in the complex environments, and effectively improve the correction rate of face recognition. In this thesis, we will introduce the process of face recognition. First, the Adaboost face detection method is given: Calculating the integral image through our Adaboost algorithm, the results of integral image shows that we can improve the training speed and detection rate of images. Then through tandem method we can combine a cascade classifier of cascade structure by Adaboost classifier of trained. It can rapidly solve face detection and exclude samples of non-face. Second, the Principal Component Analysis feature extraction method is presented: Through our conversion of PCA, the result of conversion of PCA shows that we can effectively reduce dimensions of image and retain the large variation of image features, and with grayscale conversion and histogram equalization can reduce the computation of image processing and light source averaging, as a result, the feature of the image can be raised effectively and obviously. Third, the ACO-K-means face recognition method is proposed: we propose ant colony optimization combining the theory of K-means to improve the disadvantage of local minimum in K-means. The results of ACO-K-means improve the correct rate of K-means classification and face recognition. Finally, we prove the feasibility of this system by experiments. We simulate the experiments by Matlab. At first, a face image is input, and then we can get the results by the face recognition system. The experimental results show that it can solve face recognition successfully even in a complex environment. It actually increases the correction rate of face recognition. In this thesis the contributions of our research are as follows: 1.Interchangeable: In a complex environment, we can carry out face detection by Adaboost system 2.Ameliorative: Ant Colony Optimization overcomes the disadvantage of local minimum in traditional k-means, and improves the correcting rate of face recognition. 3.Expandable: This result of our thesis can be used for real-time face recognition, for example, burglar system.

APA, Harvard, Vancouver, ISO, and other styles

36

Pei-RongChiang and 江佩蓉. "Identification of Partial Discharge Signal in XLPE Cables Using K-means Method and Neural Network." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/84187930133546263609.

Full text

Abstract:

碩士
國立成功大學
電機工程學系
103
In this thesis, we aim to develop a system to recognize partial discharge (PD) signal patterns in XLPE power cables. The PD signals are detected by high-frequency current transformer (HFCT) sensors and the PD patterns can be extracted from the raw data with wavelet de-noising method. To identify the PD patterns, the K-means algorithm is presented to distinguish different kinds of faults in power cables. Moreover, the features of 3D patterns extracted from PD patterns can be identified by back propagation neural networks (BPN). On the basis of these results, the system can provide inspection personnel a powerful tool to determine possible PD fault types and maintain related equipment.

APA, Harvard, Vancouver, ISO, and other styles

37

WANG, BO-SHINE, and 王柏勝. "Using k-means clustering algorithm-based robust adaptive clustering analysis method for software fault prediction." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/64607850323904520947.

Full text

Abstract:

碩士
國立雲林科技大學
資訊管理系
102
Software fault is an error situation of the software system because of wrong specification and inappropriate development of configuration. Almost all kinds of work related to the software system until now. So, the problem of reliability of software system has become one of the key elements between software develop process and software engineering task. At present, most of the studies focused on supervised learning, but some people think that semi-supervised learning and unsupervised learning are necessary. In this paper, we propose k-means clustering algorithm-based robust adaptive clustering analysis method for software fault prediction in unsupervised environment. First, we output the result of cluster number ranging from 2 to k by using K-means clustering algorithm. Second, integrate the cluster result by using matrix. In the end, we using iterative cluster partitioning technology to find out best cluster number and final result and clustering result comparing results is the best with the other like methods.

APA, Harvard, Vancouver, ISO, and other styles

38

HUANG, CHENG-YO, and 黃丞佑. "The Scheduling System Design of CAN FD Vehicle Communication Network based on K-means Clustering Method." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/k5p5yh.

Full text

Abstract:

碩士
國立虎尾科技大學
資訊工程系碩士班
106
With the development trend of unmanned vehicles, more and more sensors and electronic units are used, with machine learning and AI control are the key projects of future development. Because machine learning can design and analyze some computer or embedded system can automatically "learn" the algorithm, with the number of training to automatically analyze the rules, and the use of laws to predict the unknown data is very suitable for use in the complex unmanned vehicle system. Because of the complexity of the vehicle network, the bandwidth usage of the existing CAN network is close to the technical limit. As a result, Bosch introduced the CAN FD protocol in 2012. And CAN FD inherits the main characteristics of CAN. Therefore, it is very important to study the hybrid CAN and CAN FD networks. However, how to effectively use bandwidth in CAN FD is still a problem. This paper presents a method for CAN FD machine learning K-means data grouping and hybrid CAN and CAN FD networks, this method is used as the data grouping of CAN FD, and the reference of CAN FD data is changed as the priority of data, so that the bandwidth of CAN FD network is effectively utilized. This research is divided into two parts, the first part is to use machine learning K-means method to do data clustering simulation, the data according to the result of clustering, change CAN FD data in the arbitration phase. The second part realizes the hybrid CAN and CAN FD network, and uses the ECU sim2000 OBD-II simulator and the hand-held vehicle online diagnosis system as the test and verification of this system. Finally, after the integration test of this research, it is proved that this design can not only be compatible with the current CAN vehicle network, but also can change the CAN FD message according to the grouping result after experimental K-means processing six different data amount of CAN FD message grouping. The priority order can effectively reduce the data loss rate of the CAN FD network. Among them, the rate of CAN FD arbitration phase is 1Mbps and the data phase is 2Mbps and 4Mbps respectively, the data loss rate is reduced by 2.86% and 2.58%, which provides better reliability for CAN FD network.

APA, Harvard, Vancouver, ISO, and other styles

39

Chen, Bang-Yin, and 陳邦尹. "Source Separation in the Frequency Domain: Solving the Permutation Problem by a Sliding K-means Method." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/tbhk8n.

Full text

Abstract:

碩士
國立清華大學
電機工程學系
107
This thesis aims at solving source separation problem in the frequency domain. In an actual environment, mixed source signals are convolutive mixtures. Some previous works indicate that it is easier to separate convolutive mixtures in the 2-dimensional time-frequency domain after applying short-time Fourier transform (STFT) to the signals. Then, independent component analysis (ICA) is utilized to separate the sources in each frequency bin. However, this leaves two uncertain factors to handle, namely the scaling problem and the permutation problem. Among these two problems, the latter is the focus in this thesis. Considering the permutation problem, the correlation method and the sliding k-means method are proposed and compared based on the assumption that higher correlations should be found between the temporal envelopes of neighboring frequency bins from the same source. After going through ICA and solving these two problems, the un-mixing matrix can be calculated. To evaluate the performance, we measured the frequency response of the environment and obtained the mixing matrix which can serve as the ground truth. Then, a scoring system combining both matrices and two objective indices are defined to quantify and evaluate the separation performance objectively. In our experiments, we divide the singers into 3 groups (male+male, female+female, male+female). Among 3 groups, the permutation accuracy of the k-means method can reach at least 90.5 % with respect to different parameters. After introducing the "sliding process", the permutation accuracy generally rises 1~3 %. On the other hand, the correlation method can reach higher permutation accuracy than the k-means method but is vulnerable to parametric variations and shows great instability. The results have shown that our new approach is stable and also yields a comparable performance.

APA, Harvard, Vancouver, ISO, and other styles

40

Wu, Sheng-Kong, and 吳盛宏. "Combining Adaptive Resonance Theory and K-Means Method for Data Clustering - On-line Game as an Example." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/20381069759338972309.

Full text

Abstract:

碩士
玄奘大學
資訊科學學系碩士班
96
The application in company is more and more extensive at Data Mining and Neural Network. The company can use these method to digging new customer and preserving old customer. In Data mining and Adaptive Resonance Theory, data clustering is the most used. This article mainly inquired into the difference of data clustering, advantage ,and disadvantage between K-means of data mining and ART of neural network. And we combined and compared similar each other when we assumed the clustering value number fix for 5% with two method. This research also used the data of a set of network game questionnaire to treating for two methods. We compared the original data and clustering with ART and K-means. We find best hiving off for the way of advocating peace to make K-means subsidiary with ART. Through the explanation of this case, we can prove that relatively accord with the view that this research institute puts forward.

APA, Harvard, Vancouver, ISO, and other styles

41

Juan, Yu-Ting, and 阮毓庭. "Three-dimensional Geometry Reconstruction of Mouse Liver from MR Images Using K-means Method with Confusion Component Removing." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/5f53jr.

Full text

Abstract:

碩士
國立中央大學
數學系
107
Liver diseases are always on the list of the top 10 causes of death in Taiwan. Early primary liver cancer is difficult to detect because the initial symptoms are usually not obvious. But unless it is discovered when the tumor is very small, liver cancer is difficult to control. therefore, we desire to build a numerical simulation of the liver structure, including blood vessel topography, liver surface. Before the simulation, we should segment liver from MR images. Medical images mostly contain complicated structures, and image segmentation is a key task in many medical applications. Their precise segmentation is necessary for simulation. Since seeking the subject for scanning MRI isn't a simple matter, we use a mouse liver image to do simulation. However, mouse liver boundaries in MR images are usually unclear, the traditional edge-based method for segmentation is unsuitable. In this paper, we propose a way that creating a new image is combined T1-weighted (T1), T2-weighted MRI (T2) and T1-weighted MRI with contrast enhancement (T1 C+(Primovist)) image. We compare the image which doing confusion component removing with the original image after segmentation using k-means method afterward. The result presents that accuracy is improved. In the future, we look forward to applying on the numerical simulation.

APA, Harvard, Vancouver, ISO, and other styles

42

"Transportation Techniques for Geometric Clustering." Doctoral diss., 2020. http://hdl.handle.net/2286/R.I.57239.

Full text

Abstract:

abstract: This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing.
Dissertation/Thesis
Doctoral Dissertation Computer Engineering 2020

APA, Harvard, Vancouver, ISO, and other styles

43

Tseng, Yu-Tang, and 曾鈺棠. "Application of K-means method to improve decision making results of a consensus model in a context awareness framework." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/38346451437110323454.

Full text

Abstract:

碩士
中原大學
工業與系統工程研究所
103
With the popularity of smart devices and the variety of built-in sensors, context-awareness applications were broadly developed to fulfill different requirements. The application programs deal with various data and tasks, therefore, optimizing context-awareness systems would provide better services for users. This research considers a consensus decision making process and recognizes few users’ extreme preferences would make the direction process off the right track. This research proposed to develop a consensus decision support system and it would help users quickly find acceptable results for the majority of users. When users discuss consensus decision with the context awareness system, this research uses a clustering method to group different opinions and to identify which opinions are more acceptable. This method would improve the decision making problem caused by extreme preferences of few users. First, a context aware framework was developed and users’ preference was noted as a real number between 0 and 1, in which 0 represents not like, 1 represents like, and 0.5 is no comment. Next a consensus model and a K-means consensus model were developed. This research used data mining software to group preference data, and the centroid of a group represents the opinion of that group. If there was more than one group that had the same centroid, then the preference of that centroid was determined by the numbers of the groups. F-measure was used as an index to compare the performance of the proposed model and human judgment. And the Xie-Beni index was applied to find the grouping number with higher accuracy. This study used finding a common dining restaurant as an example to illustrate the proposed model. The experimental results showed the proposed K-means consensus model could improve the decision offset problem caused by extreme preferences of few users. When grouping numbers are larger than 5, K-means consensus model would be more accurate than consensus model in this case. The more of the group number, the larger of the Xie Beni index would be. In this experiment, the results of the K-means consensus model with Xie Beni index between 100 and 1000 could be more accurate than the results of the consensus model. However, if Xie Beni index was larger than 1000, decisions could be wrong and the reason could be the scattering data.

APA, Harvard, Vancouver, ISO, and other styles

44

Wen-Feng, Wu, and 吳文鳳. "A Study of Data Hiding Method in Color Image using Grouping Palette Index by Particle Swarm Optimization with K-means Clustering." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/09396474013896816616.

Full text

Abstract:

碩士
玄奘大學
資訊管理學系碩士班
99
We propose a data hiding method in color image with its image palette. Many authors usually embed data into the palette or into the index table of the palette directly. Those data hiding methods embedded the secret data into palette itself, the palette will be changed to a different one. It becomes more difficultly to reveal the embedded information. We apply the particle swarm optimization method with K-means clustering to divide the color image palette into several groups. The largest numbers of pixels of a palette group has, the more data may be embedded in the pixel that falls in this group. In each candidate embedding pixel we check it belongs which group, then we know how many bits can be embedded, due to the number of group members we are going to use is power of two. The current embedding pixel will be replaced by the same group of pixel in the order of embedding data value. The extraction method firstly groups of the pixels of stego-image, and check the pixels to find which group has. Then, find what order in its group. That order is the embedded value. The information can be extracted from each group till all the pixels are extracted. From the experimental results, the method has the good embedding capacity and the image quality. Additionally, the proposed method will not be affected by the change of the order of the color palette after embedded since we keep the highest frequency for each cluster.

APA, Harvard, Vancouver, ISO, and other styles

45

Yu-ChengChen and 陳友政. "A Semi-Automatic Biomechanical Analysis System based on K-Means Clustering and Finite Element Method - a Case Study of Dental Implants." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/gksz9q.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

(8797292), Varisht Raheja. "ASSESSING THE PERFORMANCE OF PROCEDURALLY GENERATED TERRAINS USING HOUDINI’S CLUSTERING METHOD." Thesis, 2020.

Find full text

Abstract:

Terrain generation is a convoluted and a popular topic in the VFX industry. Whether you are part of the film/TV or gaming industry, a terrain, is a highly nuanced feature that is usually present. Regardless of walking on a desert like terrain in the film, Blade Runner 2049 or fighting on different planets like in Avatar, 3D terrains is a major part of any digital media. The purpose of this thesis is about developing a workflow for large-scale terrains using complex data sets and utilizing this workflow to maintain a balance between the procedural content and the artistic input made especially for smaller companies which cannot afford an enhanced pipeline to deal with major technical complications. The workflow consists of two major elements, development of the tool used to optimize the workflow and the recording and maintaining of the efficiency in comparison to the older workflow.

My research findings indicate that despite the increase in overall computational abilities, one of the many issues that are still present is generating a highly advanced terrain with the added benefits of the artists and users’ creative variations. Reducing the overall time to simulate and compute a highly realistic and detailed terrain is the main goal, thus this thesis will present a method to overcome the speed deficiency while keeping the details of the terrain present.

APA, Harvard, Vancouver, ISO, and other styles

47

Chen, Pin-Wen, and 陳品文. "New Methods for the Initialization of K-means Clustering Algorithm." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/uq4w6w.

Full text

Abstract:

碩士
國立東華大學
企業管理學系
95
Cluster analysis is an important pattern recognition tools in data mining, which is widely used in various fields such as computer science, statically analysis and biology. K-means algorithm is one of the most popular used in cluster analysis; it is also more efficient than other methods. While K-means algorithm also has several shortcomings: the selecting of initial cluster centroids has great impacts on the executive efficiency of clustering; the user has to decide the number of clusters in advance; it is very sensitive for noise or isolated data points. This study focuses on improving the initialization of K-means algorithm, and tries to reduce the effect of noise data upon the clustering results. We combine the concept of hierarchical method and grid method to improve K-means algorithm. In the first part of this study, we propose a new algorithm: Bi-Section. Bi-Section algorithm will first bisect each dimension of data space, and thus the data space is divided into 2d parts. Then Bi-Section will compute the statistical information for each part to decide the allocation of the number of initial cluster centroids. In the second part, we propose the HBi-Section algorithm, which is based on Bi-Section. HBi-Section algorithm will build a tree structure to quickly compute the statistical information for each of the 2d parts. Thus, we can obtain an efficient improved K-means algorithm.

APA, Harvard, Vancouver, ISO, and other styles

48

Lin, Zhi-Xuan, and 林志軒. "Face Discriminative Methods Across Age Progression Using Local K-means Ensemble." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/03123611173539809587.

Full text

Abstract:

碩士
國立東華大學
資訊工程學系
103
Face recognition is widely used in many computer vision applications such as surveillance, traffic monitoring, robot vision, access control and so on. However, there still exist some problems in face recognition which the light changes, expression changes, head movements, accessory occlusion and aging effect are the main issues. For the aging effect, the shape and texture change degrade the performance of face recognition. To solve the issue in across age face recognition, we propose face discriminative methods across age progression using local K-means ensemble. First, we find that the gradient angle, extracted from the rigid face region, provides a simple but effective representation for this issue. This representation is further improved when hierarchical structure is used, which leads to the use of the K-means pyramid (KMP). When combined with supervised learning, KMP demonstrates excellent performance in our experiments. Experimental results demonstrate that the proposed across age methods outperform the existing techniques.

APA, Harvard, Vancouver, ISO, and other styles

49

Worawut, Dabpimsri, and 陳曉君. "Comparison of Two-Stage Clustering Methods: SOM and K-Means Algorithm and Hierarchical Clustering and K-Means Algorithm in Tourist Information Management in Phuket, Thailand." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/javpt5.

Full text

Abstract:

碩士
國立澎湖科技大學
觀光休閒事業管理研究所
103
Abstract The objectives of this research are (1) to investigate the characteristics and behaviors of tourists who visited Phuket of Thailand and (2) to suggest the efficient approach of analyzing business data that is different in both characteristics and behaviors. In this study, two different clustering methods are selected. This study compares the performances of two stage clustering methods including SOM followed by K-Means algorithm and Hierarchical clustering followed by K-Means algorithm. There are ten factors used in clustering including zone, country, travel, province, type of accommodation, number of night, gender, age, propose of travel, career, annual income, and cost of travel and fee. By using S.E.Mean and root mean square standard deviation (SMSSTD) of each clusters as criteria in selection the numbers of cluster for segmentation. Results show that the appropriate number of clusters in segmentation is ten by using SOM and K-Means, while the number is six by using the second method. Clustering from both methods show that the majority of tourists are from Europe. The other categories reveals the information, such as travel by BTS, MRT or taxi and travel by domestic airliner. Most of the tourists choose to stay at hotel in a long time. Money they earn an average annual are moderate. But they have expenses are quite high in each day.Their purposes of visiting are for vacation during the holidays. and most of the tourists are professional. Based on the analysis, it can be concluded that the second approach has higher performance than the first one since it requires less execution time in clustering and provides more homogeneity among data within each cluster Keywords: Clustering, Data Mining, Classification, Tourism

APA, Harvard, Vancouver, ISO, and other styles

50

Chen, Chien Chung, and 陳建忠. "Pattern Discovery of Web Usage Mining by K-means of Sequence Alignment Methods." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/83712834596186922724.

Full text

Abstract:

碩士
淡江大學
資訊工程學系
92
Nowadays, in the popular of Internet, people usually use the Internet for accessing the information and frequently act for business is more and more actively. Logs on a web site keep track of browsing record of the user and conceal the user’s demand on information. By utilizing Web Usage Mining techniques on web logs, we can find out the pattern where users access web pages. To go a step further, discover the pattern of user’s behavior to improve the design of the structure of web site and propose an effective Internet performance. In this paper, about the preprocessing of Web Usage Mining, we integrate and apply the technique of Web Usage Mining was published by Cooley and Chen ; about the pattern discovery of Web Usage Mining, we apply K-means method of clustering and Sequence Alignment Methods, SAM to covert one sequence into be represented by a score to discover the pattern of user’s behavior.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Method of k-means'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles