Dissertations / Theses on the topic 'Method of k-means'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Method of k-means.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Кіріченко, Л. О., В. Г. Кобзєв, and Є. Д. Федоренко. "Data Mining methods for detection of collective anomalies in time series." Thesis, Національна академія Національної гвардії України, 2021. https://openarchive.nure.ua/handle/document/16449.
Full textHudson, Cody Landon. "Protein structure analysis and prediction utilizing the Fuzzy Greedy K-means Decision Forest model and Hierarchically-Clustered Hidden Markov Models method." Thesis, University of Central Arkansas, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=1549796.
Full textStructural genomics is a field of study that strives to derive and analyze the structural characteristics of proteins through means of experimentation and prediction using software and other automatic processes. Alongside implications for more effective drug design, the main motivation for structural genomics concerns the elucidation of each protein’s function, given that the structure of a protein almost completely governs its function. Historically, the approach to derive the structure of a protein has been through exceedingly expensive, complex, and time consuming methods such as x-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy.
In response to the inadequacies of these methods, three families of approaches developed in a relatively new branch of computer science known as bioinformatics. The aforementioned families include threading, homology-modeling, and the de novo approach. However, even these methods fail either due to impracticalities, the inability to produce novel folds, rampant complexity, inherent limitations, etc. In their stead, this work proposes the Fuzzy Greedy K-means Decision Forest model, which utilizes sequence motifs that transcend protein family boundaries to predict local tertiary structure, such that the method is cheap, effective, and can produce semi-novel folds due to its local (rather than global) prediction mechanism. This work further extends the FGK-DF model with a new algorithm, the Hierarchically Clustered-Hidden Markov Models (HC-HMM) method to extract protein primary sequence motifs in a more accurate and adequate manner than currently exhibited by the FGK-DF model, allowing for more accurate and powerful local tertiary structure predictions. Both algorithms are critically examined, their methodology thoroughly explained and tested against a consistent data set, the results thereof discussed at length.
Ruzgys, Martynas. "IT žinių portalo statistikos modulis pagrįstas grupavimu." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2007. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2007~D_20070816_143545-16583.
Full textPresented data mining methods and clustering usage in current statistical systems and created statistics module prototype for data storage, analysis and visualization for IT knowledge portal. In suggested statistics prototype database periodical data transformations are performed. Statistical data accessed in portal can be clustered. Clustered information represented graphically may serve for interpreting information when trends may be noticed. One of the best known data clustering methods – parallel k-means method – is adapted for separating similar data clusters.
紘幸, 児玉, and Hiroyuki Kodama. "工具カタログからのデータマイニングに支援されたものづくりシステムに関する研究." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB12863871/?lang=0, 2014. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB12863871/?lang=0.
Full textŽambochová, Marta. "Shluková analýza rozsáhlých souborů dat: nové postupy založené na metodě k-průměrů." Doctoral thesis, Vysoká škola ekonomická v Praze, 2005. http://www.nusl.cz/ntk/nusl-77061.
Full textKondapalli, Swetha. "An Approach To Cluster And Benchmark Regional Emergency Medical Service Agencies." Wright State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1596491788206805.
Full textGunay, Melih. "Representation Of Covariance Matrices In Track Fusion Problems." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12609026/index.pdf.
Full textAbbasian, Houman. "Inner Ensembles: Using Ensemble Methods in Learning Step." Thèse, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31127.
Full textSarazin, Marianne. "Elaboration d'un score de vieillissement : propositions théoriques." Phd thesis, Université Jean Monnet - Saint-Etienne, 2013. http://tel.archives-ouvertes.fr/tel-00994941.
Full textRamler, Ivan Peter. "Improved statistical methods for k-means clustering of noisy and directional data." [Ames, Iowa : Iowa State University], 2008.
Find full textMayer-Jochimsen, Morgan. "Clustering Methods and Their Applications to Adolescent Healthcare Data." Scholarship @ Claremont, 2013. http://scholarship.claremont.edu/scripps_theses/297.
Full textHinz, Joel. "Clustering the Web : Comparing Clustering Methods in Swedish." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-95228.
Full textGaney, Raeesa. "Principal points, principal curves and principal surfaces." Master's thesis, University of Cape Town, 2015. http://hdl.handle.net/11427/15515.
Full textVaněčková, Tereza. "Numerické metody pro klasifikaci metagenomických dat." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-242014.
Full textBaccherini, Simona. "Pattern recognition methods for EMG prosthetic control." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12033/.
Full textGuder, Mennan. "Data Mining Methods For Clustering Power Quality Data Collected Via Monitoring Systems Installed On The Electricity Network." Master's thesis, METU, 2009. http://etd.lib.metu.edu.tr/upload/3/12611120/index.pdf.
Full textLiu, Yating. "Optimal Quantization : Limit Theorem, Clustering and Simulation of the McKean-Vlasov Equation." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS215.
Full textThis thesis contains two parts. The first part addresses two limit theorems related to optimal quantization. The first limit theorem is the characterization of the convergence in the Wasserstein distance of probability measures by the pointwise convergence of Lp-quantization error functions on Rd and on a separable Hilbert space. The second limit theorem is the convergence rate of the optimal quantizer and the clustering performance for a probability measure sequence (μn)n∈N∗ on Rd converging in the Wasserstein distance, especially when (μn)n∈N∗ are the empirical measures with finite second moment but possibly unbounded support. The second part of this manuscript is devoted to the approximation and the simulation of the McKean-Vlasov equation, including several quantization based schemes and a hybrid particle-quantization scheme. We first give a proof of the existence and uniqueness of a strong solution of the McKean- Vlasov equation dXt = b(t, Xt, μt)dt + σ(t, Xt, μt)dBt under the Lipschitz coefficient condition by using Feyel’s method (see Bouleau (1988)[Section 7]). Then, we establish the convergence rate of the “theoretical” Euler scheme and as an application, we establish functional convex order results for scaled McKean-Vlasov equations with an affine drift. In the last chapter, we prove the convergence rate of the particle method, several quantization based schemes and the hybrid scheme. Finally, we simulate two examples: the Burger’s equation (Bossy and Talay (1997)) in one dimensional setting and the Network of FitzHugh-Nagumo neurons (Baladron et al. (2012)) in dimension 3
Thorstensson, Linnea. "Clustering Methods as a Recruitment Tool for Smaller Companies." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273571.
Full textNy teknologi har förenklat processen för att söka arbete. Detta har resulterat i att företag får tusentals ansökningar som de måste ta hänsyn till. För att förenkla och påskynda rekryteringsprocessen har många stora företag börjat använda sig av maskininlärningsmetoder. Mindre företag, till exempel start-ups, har inte samma möjligheter för att digitalisera deras rekrytering. De har oftast inte tillgång till stora mängder historisk ansökningsdata. Den här uppsatsen undersöker därför med hjälp av topologisk dataanalys hur klustermetoder kan användas i rekrytering på mindre datauppsättningar. Den analyserar också hur abstraktionsnivån på datan påverkar resultaten. Metoderna visar sig fungera bra för jobbpositioner av högre nivå men har problem med jobb på en lägre nivå. Det visar sig också att valet av representation av kandidater och jobb har en stor inverkan på resultaten.
Yan, Mingjin. "Methods of Determining the Number of Clusters in a Data Set and a New Clustering Criterion." Diss., Virginia Tech, 2005. http://hdl.handle.net/10919/29957.
Full textPh. D.
Yoldas, Mine. "Predicting The Effect Of Hydrophobicity Surface On Binding Affinity Of Pcp-like Compounds Using Machine Learning Methods." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613215/index.pdf.
Full textCzudek, Marek. "Detekce síťových anomálií na základě NetFlow dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-235461.
Full textMohiddin, Syed B. "Development of novel unsupervised and supervised informatics methods for drug discovery applications." The Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=osu1138385657.
Full textEvans, Jr Richard Austin. "Fostering success in reading: a survey of teaching methods and collaboration practices of high performing elementary schools in Texas." Texas A&M University, 2002. http://hdl.handle.net/1969.1/3968.
Full textHunter, Brandon. "Channel Probing for an Indoor Wireless Communications Channel." BYU ScholarsArchive, 2003. https://scholarsarchive.byu.edu/etd/64.
Full textPettersson, Christoffer. "Investigating the Correlation Between Marketing Emails and Receivers Using Unsupervised Machine Learning on Limited Data : A comprehensive study using state of the art methods for text clustering and natural language processing." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189147.
Full textMålet med detta projekt att undersöka eventuella samband mellan marknadsföringsemail och dess mottagare med hjälp av oövervakad maskininlärning på en brgränsad mängd data. Datan består av ca 1200 email meddelanden med 98.000 mottagare. Initialt så gruperas alla meddelanden baserat på innehåll via text klustering. Meddelandena innehåller ingen information angående tidigare gruppering eller kategorisering vilket skapar ett behov för ett oövervakat tillvägagångssätt för inlärning där enbart det råa textbaserade meddelandet används som indata. Projektet undersöker moderna tekniker så som bag-of-words för att avgöra termers relevans och the gap statistic för att finna ett optimalt antal kluster. Datan vektoriseras med hjälp av term frequency - inverse document frequency för att avgöra relevansen av termer relativt dokumentet samt alla dokument kombinerat. Ett fundamentalt problem som uppstår via detta tillvägagångssätt är hög dimensionalitet, vilket reduceras med latent semantic analysis tillsammans med singular value decomposition. Då alla kluster har erhållits så analyseras de mest förekommande termerna i vardera kluster och jämförs. Eftersom en initial kategorisering av meddelandena saknas så krävs ett alternativt tillvägagångssätt för evaluering av klustrens validitet. För att göra detta så hämtas och analyseras alla mottagare för vardera kluster som öppnat något av dess meddelanden. Mottagarna har olika attribut angående deras syfte med att använda produkten samt personlig information. När de har hämtats och undersökts kan slutsatser dras kring hurvida samband kan hittas. Det finns ett klart samband mellan vardera kluster och dess mottagare, men till viss utsträckning. Mottagarna från samma kluster visade likartade attribut som var urskiljbara gentemot mottagare från andra kluster. Därav kan det sägas att de resulterande klustren samt dess mottagare är specifika nog att urskilja sig från varandra men för generella för att kunna handera mer detaljerad information. Med mer data kan detta bli ett användbart verktyg för att bestämma mottagare av specifika emailutskick för att på sikt kunna öka öppningsfrekvensen och därmed nå ut till mer relevanta mottagare baserat på tidigare resultat.
Zheng, Sheng-Wen, and 鄭勝文. "Initialization of K-means using the mountain method." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/39186636901271105938.
Full text中原大學
應用數學研究所
98
When we analyze sets of data, these data may be big and complicated so that we need to use some techniques to process these data sets. For example, reduce dimensions of data, cut down memory space of computation or do compression for data. After we make these processes, we can have data reduction with important information from data and then we can make new applications. In this paper, we use the mountain method to strength a choice of initials of cluster centers in the K-means algorithm [3]. Based on these good choices of initial of cluster centers, enhance the effectiveness of the K-means algorithm. In statistics, cluster analysis [1-2] can be roughly divided into two methods. One is hierarchical clustering method. Another one is partitional clustering method. We consider the K-means algorithm [4-6] of partitional clustering in this paper. Because the K-means algorithm need to set initial values before implement. For example, cluster number and initials of cluster centers. These different initial settings may have rather different results. For solving these problems, we consider mountain method [7-9] to process. We can get two results. One is to suggest its cluster number. Another one is to approximate initials of cluster centers of the K-means algorithm. We can have two benefits using the mountain method, such as:(1) Make the K-means algorithm spend less time when the implement is convergent; (2) The final clustering results are more accurate.
Tsai, Wen-Bin, and 蔡文彬. "An Improved Initialization Method for the K-means Algorithm." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/65590580917571974843.
Full text國立東華大學
企業管理學系
94
Abstract Clustering is one of the most basic and popular technique of Data Mining. The fundamental purpose of clustering is to partition a given disordered dataset into several clusters, so that data of the same cluster are similar but data in the different clusters are dissimilar. Among numerous techniques of clustering, the K-means algorithm is one of the most widely used technique due to the outstanding efficiency and simple concept of the K-means algorithm. However, there are some problems exist in the K-means algorithm. First, the random initialization method influences the stability and correctness of the K-means algorithm. Second, the parameter need users to decide may not be completed properly. Third, the K-means algorithm can not detect noisy data. Owing to above problems, this study proposes an improved method which is named Improved K-Means (IKM) to modify the K-means algorithm. IKM algorithm makes use of the concepts including density, grid, and statistic. After compared the simulation data of IKM with K-means, we demonstrate that the stability and correctness of IKM do better than the K-means algorithm. For the case of complicated distribution of data, the performance of IKM is better than K-means’. Moreover, IKM can automatically decide the number of clusters properly and is able to detect noisy data. Besides, in large database, the efficiency of IKM will not worse than the K-means algorithm.
Yu, Qiao, and 于喬. "Accelerated K-means Algorithm Based on Efficient Filtering Method." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/8a3r65.
Full text國立臺灣科技大學
資訊工程系
107
K-means is a well-known clustering algorithm in data mining and machine learn- ing. It is widely applicable in various domains such as computer vision, market seg- mentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fis- sion-Fusion k-means accelerates k-means by efficient filtering method during the iter- ations. It can balance these expenses well between distance calculations and the fil- tering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms in the most cases. In addition, for more separated and naturally-clustered datasets, our algorithm is rela- tively faster than other accelerated k-means algorithms.
Lee, Yian-yi, and 李建逸. "A Missing Value Estimation Model Based on the Gap Statistical Method and K-means Method." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/89333796640959589742.
Full text南華大學
資訊管理學研究所
94
Data mining is a vitally important technique to unveil hidden information from a set of raw data. However, the integration of different sources of raw data usually comes along with missing values that may well be affecting the interpretation of data analysis. Such a bias effect is known as an issue of missing value of data integration. Data clustering techniques are widely deploying solutions to minimize possibilities of encountering missing values. Members of the raw data in a cluster are with similar characteristics and that will notably differ from other clusters. This feature of a data cluster is useful to derive a better similarity of data estimation model. To date, K-means method is a well known data clustering technique. However, while raw data are coming from various sources, K-means method is difficult to decide how many numbers of data cluster shall be made within. Among many approaches, the Gap statistical method is a fairly good approach to automatically estimate the number of data clusters that can compensate the shortage of K-means method. It also needs less re-iterate generations to derive better results. This study investigates into an integration of the K-means method and the Gap statistical method in order to find a generic missing value estimate model. The model will derive a most suitable estimation value which is beneficial to mine better results while holding the integration of vast number of raw data. The integration model of the study uses a database of power generation of the Taipower Company to testify its feasibility and effectiveness. The experiment results of the study show more statistical confidence than the SOM-based estimation model.
Muralla, Sumakwel. "A method of accelerating K-Means by directed perturbation of the codevectors." 2006. http://digital.library.okstate.edu/etd/umi-okstate-1882.pdf.
Full textHsiou-HenKao and 高修恒. "A Study of Improving K-means Clustering Method- Based on Sample Points." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/eznf8n.
Full text國立成功大學
統計學系碩博士班
101
Comparing to K-means algorithm, we constrain the cluster centers on the data points rather than the mean, so we propose K-exemplars algorithm. Based on this concept, K-exemplars algorithm can not just deal with the raw data but also the relational data. Although the cluster accuracy rate of K-exemplars method may not be better than K-means method, the difference is small. But the iteration times is less than K-means method significantly. This leads the convergence rate of K-exemplars is faster than K-means. In Iris data, the iteration times of K-means and K-exemplars are 7.22 and 4.02, respectively; K-exemplars reduces 3.2 iterations. Moreover, K-exemplars can be applied on any specified dissimilarity measure. K-means is influenced by outlier, but K-exemplars improves this problem.
Chuang, Fei-Chieh, and 莊斐杰. "Research and Implementation of Cluster Validity Index for K-Means Clustering Method." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/55348819688464276278.
Full text真理大學
資訊工程學系碩士班
102
In this paper, a cluster validity index called CDV index is presented. The CDV index is capable of providing a quality measurement for the goodness of a clustering result for a data set. This measurement of cluster quality creates a curve of quadratic function style, and the minimum value of the curve is the CDV index value, which means the best number of clusters found in the case. The CDV index is composed of three major factors, including a statistically calculated external diameter factor, a restorer factor to reduce the effect of data dimension, and a number of clusters related punishment factor. With the calculation of the product of the three factors under various number of clusters settings, the best clustering result for some number of clusters setting is able to be found by searching for the minimum value of CDV curve. The best clustering result is then guaranteed to have the following characteristics: the optimal compactness of intra cluster relationship, and the optimal dispersedness of inter cluster relationship. In the impirical experiments presented in this research, K-Means clustering method is chosen for its simplicity and execution speed. For the presentation of the effectiveness and superiority of the CDV index in the experiments, several traditional cluster validity indexes were implemented as the control group of experiments, including DI, DBI, ADI, and the most effective PBM index in recent years. The data sets of the experiments are also carefully selected to justify the generalization of CDV index, including three real world data sets and three artificial data sets which are the simulation of real world data distribution. These data sets are all tested to present the superior features of CDV index.
Lin, Y. F., and 林郁峰. "Applying K-means Clustering Method on PDM- Using woodworking machine as an example." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/60768020832697224531.
Full text國立勤益科技大學
工業工程與管理系
97
Product Data Management (PDM) is software-based and product-oriented technology that realizes the centralized management for product related information, production process, and resource integration. It is a useful tool for companies, engineers, and related personnel to management information and support product R&D. During the R&D stage, if a product involves a large number of parts with complicated structure, the parts would be contracted to several manufacturers for design, and later integrated into a large part. Thus, the parts need to be considered for compatibility with other parts, in order to be assembled precisely at the factory. The role of the parts manufacturers in the design process needs a common platform, in order to review the diagrams and documents during the process, so that the R&D unit could find the parts produced by each manufacturer quickly and avoid the mistake of producing multiple parts. Also, in the traditional production line or material preparation stage, using documents or manmade mistakes would also lead to using wrong materials, thus causing customer complaints or business loss. Therefore, by using the PDM system, personnel who manage or use the materials could quickly find the right parts. Most of companies do not know how to manage their product information effectively. Therefore, it wastes the recourses and increases the cost of management for companies. This R&D can help companies to get rid of their bad routine on product management and transform it becoming the assistance for them. In order to do it, the companies can use the technology of data mining and product administration system to develop an excellent solution. During the research, the companies can rely on analysis to distinguish the importance of the useful information under design procedure. Furthermore, they can use PDM to come out with some helpful solutions to deal with problems. Base on the structure of development, capitalize for data reiteration, and devolving knowledge on product management system, the companies can integrated their internal data management, knowledge management, and data saving. The purpose of this R&D is integration of the product data management and K-means clustering. First of all, the companies utilize K-means clustering to distinguish the datum. Secondly, they can apply decision tree to analyze and search for the constructive information under strategic decision. Additionally, it becomes the useful knowledge base on analysis and inference. Finally, the companies can deeply adapt PDM to their original product data management system.
Wang and 王雲輝. "Improve the NAND IC Sorting method via K-means and principal component analysis." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/843e3n.
Full text國立中央大學
工業管理研究所
107
Since the memory products have been closely related to people lives, whether it is a mobile, computer and internet service provider, the storage device behind it has been update from hard disk drive (HDD) to solid state drive(SSD), the key point is the NAND by the this research. Therefore, in the memory product market, how to launch products in the fastest time and quality is good, this is already an important core competitiveness of each company. This study will use the K-means to group the NAND, classify the ICs with the same characteristics, so that product firmware developers can focus on the NAND and understand the characteristics of the group NAND with appropriate firmware algorithms to handler the process problem and other lot or grade. The experimental results show that the method of this study has the same characteristics in the same type and different batch of NAND, this can also be provided to the R&D and quality department for reference the product and material quality.
Lin, Yong-Hui, and 林泳輝. "A Face Recognition Method Based on the Ant Colony Optimization and K-means Algorithm." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/92744678543355595411.
Full text中原大學
電機工程研究所
101
In this thesis, we propose a face recognition method based on ant colony optimization and K-means theory. The main purpose is to hope that through this system, the face detection can be well completed in the complex environments, and effectively improve the correction rate of face recognition. In this thesis, we will introduce the process of face recognition. First, the Adaboost face detection method is given: Calculating the integral image through our Adaboost algorithm, the results of integral image shows that we can improve the training speed and detection rate of images. Then through tandem method we can combine a cascade classifier of cascade structure by Adaboost classifier of trained. It can rapidly solve face detection and exclude samples of non-face. Second, the Principal Component Analysis feature extraction method is presented: Through our conversion of PCA, the result of conversion of PCA shows that we can effectively reduce dimensions of image and retain the large variation of image features, and with grayscale conversion and histogram equalization can reduce the computation of image processing and light source averaging, as a result, the feature of the image can be raised effectively and obviously. Third, the ACO-K-means face recognition method is proposed: we propose ant colony optimization combining the theory of K-means to improve the disadvantage of local minimum in K-means. The results of ACO-K-means improve the correct rate of K-means classification and face recognition. Finally, we prove the feasibility of this system by experiments. We simulate the experiments by Matlab. At first, a face image is input, and then we can get the results by the face recognition system. The experimental results show that it can solve face recognition successfully even in a complex environment. It actually increases the correction rate of face recognition. In this thesis the contributions of our research are as follows: 1.Interchangeable: In a complex environment, we can carry out face detection by Adaboost system 2.Ameliorative: Ant Colony Optimization overcomes the disadvantage of local minimum in traditional k-means, and improves the correcting rate of face recognition. 3.Expandable: This result of our thesis can be used for real-time face recognition, for example, burglar system.
Pei-RongChiang and 江佩蓉. "Identification of Partial Discharge Signal in XLPE Cables Using K-means Method and Neural Network." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/84187930133546263609.
Full text國立成功大學
電機工程學系
103
In this thesis, we aim to develop a system to recognize partial discharge (PD) signal patterns in XLPE power cables. The PD signals are detected by high-frequency current transformer (HFCT) sensors and the PD patterns can be extracted from the raw data with wavelet de-noising method. To identify the PD patterns, the K-means algorithm is presented to distinguish different kinds of faults in power cables. Moreover, the features of 3D patterns extracted from PD patterns can be identified by back propagation neural networks (BPN). On the basis of these results, the system can provide inspection personnel a powerful tool to determine possible PD fault types and maintain related equipment.
WANG, BO-SHINE, and 王柏勝. "Using k-means clustering algorithm-based robust adaptive clustering analysis method for software fault prediction." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/64607850323904520947.
Full text國立雲林科技大學
資訊管理系
102
Software fault is an error situation of the software system because of wrong specification and inappropriate development of configuration. Almost all kinds of work related to the software system until now. So, the problem of reliability of software system has become one of the key elements between software develop process and software engineering task. At present, most of the studies focused on supervised learning, but some people think that semi-supervised learning and unsupervised learning are necessary. In this paper, we propose k-means clustering algorithm-based robust adaptive clustering analysis method for software fault prediction in unsupervised environment. First, we output the result of cluster number ranging from 2 to k by using K-means clustering algorithm. Second, integrate the cluster result by using matrix. In the end, we using iterative cluster partitioning technology to find out best cluster number and final result and clustering result comparing results is the best with the other like methods.
HUANG, CHENG-YO, and 黃丞佑. "The Scheduling System Design of CAN FD Vehicle Communication Network based on K-means Clustering Method." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/k5p5yh.
Full text國立虎尾科技大學
資訊工程系碩士班
106
With the development trend of unmanned vehicles, more and more sensors and electronic units are used, with machine learning and AI control are the key projects of future development. Because machine learning can design and analyze some computer or embedded system can automatically "learn" the algorithm, with the number of training to automatically analyze the rules, and the use of laws to predict the unknown data is very suitable for use in the complex unmanned vehicle system. Because of the complexity of the vehicle network, the bandwidth usage of the existing CAN network is close to the technical limit. As a result, Bosch introduced the CAN FD protocol in 2012. And CAN FD inherits the main characteristics of CAN. Therefore, it is very important to study the hybrid CAN and CAN FD networks. However, how to effectively use bandwidth in CAN FD is still a problem. This paper presents a method for CAN FD machine learning K-means data grouping and hybrid CAN and CAN FD networks, this method is used as the data grouping of CAN FD, and the reference of CAN FD data is changed as the priority of data, so that the bandwidth of CAN FD network is effectively utilized. This research is divided into two parts, the first part is to use machine learning K-means method to do data clustering simulation, the data according to the result of clustering, change CAN FD data in the arbitration phase. The second part realizes the hybrid CAN and CAN FD network, and uses the ECU sim2000 OBD-II simulator and the hand-held vehicle online diagnosis system as the test and verification of this system. Finally, after the integration test of this research, it is proved that this design can not only be compatible with the current CAN vehicle network, but also can change the CAN FD message according to the grouping result after experimental K-means processing six different data amount of CAN FD message grouping. The priority order can effectively reduce the data loss rate of the CAN FD network. Among them, the rate of CAN FD arbitration phase is 1Mbps and the data phase is 2Mbps and 4Mbps respectively, the data loss rate is reduced by 2.86% and 2.58%, which provides better reliability for CAN FD network.
Chen, Bang-Yin, and 陳邦尹. "Source Separation in the Frequency Domain: Solving the Permutation Problem by a Sliding K-means Method." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/tbhk8n.
Full text國立清華大學
電機工程學系
107
This thesis aims at solving source separation problem in the frequency domain. In an actual environment, mixed source signals are convolutive mixtures. Some previous works indicate that it is easier to separate convolutive mixtures in the 2-dimensional time-frequency domain after applying short-time Fourier transform (STFT) to the signals. Then, independent component analysis (ICA) is utilized to separate the sources in each frequency bin. However, this leaves two uncertain factors to handle, namely the scaling problem and the permutation problem. Among these two problems, the latter is the focus in this thesis. Considering the permutation problem, the correlation method and the sliding k-means method are proposed and compared based on the assumption that higher correlations should be found between the temporal envelopes of neighboring frequency bins from the same source. After going through ICA and solving these two problems, the un-mixing matrix can be calculated. To evaluate the performance, we measured the frequency response of the environment and obtained the mixing matrix which can serve as the ground truth. Then, a scoring system combining both matrices and two objective indices are defined to quantify and evaluate the separation performance objectively. In our experiments, we divide the singers into 3 groups (male+male, female+female, male+female). Among 3 groups, the permutation accuracy of the k-means method can reach at least 90.5 % with respect to different parameters. After introducing the "sliding process", the permutation accuracy generally rises 1~3 %. On the other hand, the correlation method can reach higher permutation accuracy than the k-means method but is vulnerable to parametric variations and shows great instability. The results have shown that our new approach is stable and also yields a comparable performance.
Wu, Sheng-Kong, and 吳盛宏. "Combining Adaptive Resonance Theory and K-Means Method for Data Clustering - On-line Game as an Example." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/20381069759338972309.
Full text玄奘大學
資訊科學學系碩士班
96
The application in company is more and more extensive at Data Mining and Neural Network. The company can use these method to digging new customer and preserving old customer. In Data mining and Adaptive Resonance Theory, data clustering is the most used. This article mainly inquired into the difference of data clustering, advantage ,and disadvantage between K-means of data mining and ART of neural network. And we combined and compared similar each other when we assumed the clustering value number fix for 5% with two method. This research also used the data of a set of network game questionnaire to treating for two methods. We compared the original data and clustering with ART and K-means. We find best hiving off for the way of advocating peace to make K-means subsidiary with ART. Through the explanation of this case, we can prove that relatively accord with the view that this research institute puts forward.
Juan, Yu-Ting, and 阮毓庭. "Three-dimensional Geometry Reconstruction of Mouse Liver from MR Images Using K-means Method with Confusion Component Removing." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/5f53jr.
Full text國立中央大學
數學系
107
Liver diseases are always on the list of the top 10 causes of death in Taiwan. Early primary liver cancer is difficult to detect because the initial symptoms are usually not obvious. But unless it is discovered when the tumor is very small, liver cancer is difficult to control. therefore, we desire to build a numerical simulation of the liver structure, including blood vessel topography, liver surface. Before the simulation, we should segment liver from MR images. Medical images mostly contain complicated structures, and image segmentation is a key task in many medical applications. Their precise segmentation is necessary for simulation. Since seeking the subject for scanning MRI isn't a simple matter, we use a mouse liver image to do simulation. However, mouse liver boundaries in MR images are usually unclear, the traditional edge-based method for segmentation is unsuitable. In this paper, we propose a way that creating a new image is combined T1-weighted (T1), T2-weighted MRI (T2) and T1-weighted MRI with contrast enhancement (T1 C+(Primovist)) image. We compare the image which doing confusion component removing with the original image after segmentation using k-means method afterward. The result presents that accuracy is improved. In the future, we look forward to applying on the numerical simulation.
"Transportation Techniques for Geometric Clustering." Doctoral diss., 2020. http://hdl.handle.net/2286/R.I.57239.
Full textDissertation/Thesis
Doctoral Dissertation Computer Engineering 2020
Tseng, Yu-Tang, and 曾鈺棠. "Application of K-means method to improve decision making results of a consensus model in a context awareness framework." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/38346451437110323454.
Full text中原大學
工業與系統工程研究所
103
With the popularity of smart devices and the variety of built-in sensors, context-awareness applications were broadly developed to fulfill different requirements. The application programs deal with various data and tasks, therefore, optimizing context-awareness systems would provide better services for users. This research considers a consensus decision making process and recognizes few users’ extreme preferences would make the direction process off the right track. This research proposed to develop a consensus decision support system and it would help users quickly find acceptable results for the majority of users. When users discuss consensus decision with the context awareness system, this research uses a clustering method to group different opinions and to identify which opinions are more acceptable. This method would improve the decision making problem caused by extreme preferences of few users. First, a context aware framework was developed and users’ preference was noted as a real number between 0 and 1, in which 0 represents not like, 1 represents like, and 0.5 is no comment. Next a consensus model and a K-means consensus model were developed. This research used data mining software to group preference data, and the centroid of a group represents the opinion of that group. If there was more than one group that had the same centroid, then the preference of that centroid was determined by the numbers of the groups. F-measure was used as an index to compare the performance of the proposed model and human judgment. And the Xie-Beni index was applied to find the grouping number with higher accuracy. This study used finding a common dining restaurant as an example to illustrate the proposed model. The experimental results showed the proposed K-means consensus model could improve the decision offset problem caused by extreme preferences of few users. When grouping numbers are larger than 5, K-means consensus model would be more accurate than consensus model in this case. The more of the group number, the larger of the Xie Beni index would be. In this experiment, the results of the K-means consensus model with Xie Beni index between 100 and 1000 could be more accurate than the results of the consensus model. However, if Xie Beni index was larger than 1000, decisions could be wrong and the reason could be the scattering data.
Wen-Feng, Wu, and 吳文鳳. "A Study of Data Hiding Method in Color Image using Grouping Palette Index by Particle Swarm Optimization with K-means Clustering." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/09396474013896816616.
Full text玄奘大學
資訊管理學系碩士班
99
We propose a data hiding method in color image with its image palette. Many authors usually embed data into the palette or into the index table of the palette directly. Those data hiding methods embedded the secret data into palette itself, the palette will be changed to a different one. It becomes more difficultly to reveal the embedded information. We apply the particle swarm optimization method with K-means clustering to divide the color image palette into several groups. The largest numbers of pixels of a palette group has, the more data may be embedded in the pixel that falls in this group. In each candidate embedding pixel we check it belongs which group, then we know how many bits can be embedded, due to the number of group members we are going to use is power of two. The current embedding pixel will be replaced by the same group of pixel in the order of embedding data value. The extraction method firstly groups of the pixels of stego-image, and check the pixels to find which group has. Then, find what order in its group. That order is the embedded value. The information can be extracted from each group till all the pixels are extracted. From the experimental results, the method has the good embedding capacity and the image quality. Additionally, the proposed method will not be affected by the change of the order of the color palette after embedded since we keep the highest frequency for each cluster.
Yu-ChengChen and 陳友政. "A Semi-Automatic Biomechanical Analysis System based on K-Means Clustering and Finite Element Method - a Case Study of Dental Implants." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/gksz9q.
Full text(8797292), Varisht Raheja. "ASSESSING THE PERFORMANCE OF PROCEDURALLY GENERATED TERRAINS USING HOUDINI’S CLUSTERING METHOD." Thesis, 2020.
Find full textTerrain generation is a convoluted and a popular topic in the VFX industry. Whether you are part of the film/TV or gaming industry, a terrain, is a highly nuanced feature that is usually present. Regardless of walking on a desert like terrain in the film, Blade Runner 2049 or fighting on different planets like in Avatar, 3D terrains is a major part of any digital media. The purpose of this thesis is about developing a workflow for large-scale terrains using complex data sets and utilizing this workflow to maintain a balance between the procedural content and the artistic input made especially for smaller companies which cannot afford an enhanced pipeline to deal with major technical complications. The workflow consists of two major elements, development of the tool used to optimize the workflow and the recording and maintaining of the efficiency in comparison to the older workflow.
My research findings indicate that despite the increase in overall computational abilities, one of the many issues that are still present is generating a highly advanced terrain with the added benefits of the artists and users’ creative variations. Reducing the overall time to simulate and compute a highly realistic and detailed terrain is the main goal, thus this thesis will present a method to overcome the speed deficiency while keeping the details of the terrain present.
Chen, Pin-Wen, and 陳品文. "New Methods for the Initialization of K-means Clustering Algorithm." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/uq4w6w.
Full text國立東華大學
企業管理學系
95
Cluster analysis is an important pattern recognition tools in data mining, which is widely used in various fields such as computer science, statically analysis and biology. K-means algorithm is one of the most popular used in cluster analysis; it is also more efficient than other methods. While K-means algorithm also has several shortcomings: the selecting of initial cluster centroids has great impacts on the executive efficiency of clustering; the user has to decide the number of clusters in advance; it is very sensitive for noise or isolated data points. This study focuses on improving the initialization of K-means algorithm, and tries to reduce the effect of noise data upon the clustering results. We combine the concept of hierarchical method and grid method to improve K-means algorithm. In the first part of this study, we propose a new algorithm: Bi-Section. Bi-Section algorithm will first bisect each dimension of data space, and thus the data space is divided into 2d parts. Then Bi-Section will compute the statistical information for each part to decide the allocation of the number of initial cluster centroids. In the second part, we propose the HBi-Section algorithm, which is based on Bi-Section. HBi-Section algorithm will build a tree structure to quickly compute the statistical information for each of the 2d parts. Thus, we can obtain an efficient improved K-means algorithm.
Lin, Zhi-Xuan, and 林志軒. "Face Discriminative Methods Across Age Progression Using Local K-means Ensemble." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/03123611173539809587.
Full text國立東華大學
資訊工程學系
103
Face recognition is widely used in many computer vision applications such as surveillance, traffic monitoring, robot vision, access control and so on. However, there still exist some problems in face recognition which the light changes, expression changes, head movements, accessory occlusion and aging effect are the main issues. For the aging effect, the shape and texture change degrade the performance of face recognition. To solve the issue in across age face recognition, we propose face discriminative methods across age progression using local K-means ensemble. First, we find that the gradient angle, extracted from the rigid face region, provides a simple but effective representation for this issue. This representation is further improved when hierarchical structure is used, which leads to the use of the K-means pyramid (KMP). When combined with supervised learning, KMP demonstrates excellent performance in our experiments. Experimental results demonstrate that the proposed across age methods outperform the existing techniques.
Worawut, Dabpimsri, and 陳曉君. "Comparison of Two-Stage Clustering Methods: SOM and K-Means Algorithm and Hierarchical Clustering and K-Means Algorithm in Tourist Information Management in Phuket, Thailand." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/javpt5.
Full text國立澎湖科技大學
觀光休閒事業管理研究所
103
Abstract The objectives of this research are (1) to investigate the characteristics and behaviors of tourists who visited Phuket of Thailand and (2) to suggest the efficient approach of analyzing business data that is different in both characteristics and behaviors. In this study, two different clustering methods are selected. This study compares the performances of two stage clustering methods including SOM followed by K-Means algorithm and Hierarchical clustering followed by K-Means algorithm. There are ten factors used in clustering including zone, country, travel, province, type of accommodation, number of night, gender, age, propose of travel, career, annual income, and cost of travel and fee. By using S.E.Mean and root mean square standard deviation (SMSSTD) of each clusters as criteria in selection the numbers of cluster for segmentation. Results show that the appropriate number of clusters in segmentation is ten by using SOM and K-Means, while the number is six by using the second method. Clustering from both methods show that the majority of tourists are from Europe. The other categories reveals the information, such as travel by BTS, MRT or taxi and travel by domestic airliner. Most of the tourists choose to stay at hotel in a long time. Money they earn an average annual are moderate. But they have expenses are quite high in each day.Their purposes of visiting are for vacation during the holidays. and most of the tourists are professional. Based on the analysis, it can be concluded that the second approach has higher performance than the first one since it requires less execution time in clustering and provides more homogeneity among data within each cluster Keywords: Clustering, Data Mining, Classification, Tourism
Chen, Chien Chung, and 陳建忠. "Pattern Discovery of Web Usage Mining by K-means of Sequence Alignment Methods." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/83712834596186922724.
Full text淡江大學
資訊工程學系
92
Nowadays, in the popular of Internet, people usually use the Internet for accessing the information and frequently act for business is more and more actively. Logs on a web site keep track of browsing record of the user and conceal the user’s demand on information. By utilizing Web Usage Mining techniques on web logs, we can find out the pattern where users access web pages. To go a step further, discover the pattern of user’s behavior to improve the design of the structure of web site and propose an effective Internet performance. In this paper, about the preprocessing of Web Usage Mining, we integrate and apply the technique of Web Usage Mining was published by Cooley and Chen ; about the pattern discovery of Web Usage Mining, we apply K-means method of clustering and Sequence Alignment Methods, SAM to covert one sequence into be represented by a score to discover the pattern of user’s behavior.