Dissertations / Theses on the topic 'K-means clustering'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'K-means clustering.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Buchta, Christian, Martin Kober, Ingo Feinerer, and Kurt Hornik. "Spherical k-Means Clustering." American Statistical Association, 2012. http://epub.wu.ac.at/4000/1/paper.pdf.
Full textMusco, Cameron N. (Cameron Nicholas). "Dimensionality reduction for k-means clustering." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101473.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 123-131).
In this thesis we study dimensionality reduction techniques for approximate k-means clustering. Given a large dataset, we consider how to quickly compress to a smaller dataset (a sketch), such that solving the k-means clustering problem on the sketch will give an approximately optimal solution on the original dataset. First, we provide an exposition of technical results of [CEM+15], which show that provably accurate dimensionality reduction is possible using common techniques such as principal component analysis, random projection, and random sampling. We next present empirical evaluations of dimensionality reduction techniques to supplement our theoretical results. We show that our dimensionality reduction algorithms, along with heuristics based on these algorithms, indeed perform well in practice. Finally, we discuss possible extensions of our work to neurally plausible algorithms for clustering and dimensionality reduction. This thesis is based on joint work with Michael Cohen, Samuel Elder, Nancy Lynch, Christopher Musco, and Madalina Persu.
by Cameron N. Musco.
S.M.
Persu, Elena-Mădălina. "Approximate k-means clustering through random projections." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/99847.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 39-41).
Using random row projections, we show how to approximate a data matrix A with a much smaller sketch à that can be used to solve a general class of constrained k-rank approximation problems to within (1 + [epsilon]) error. Importantly, this class of problems includes k-means clustering. By reducing data points to just O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For k-means dimensionality reduction, we provide (1+ [epsilon]) relative error results for random row projections which improve on the (2 + [epsilon]) prior known constant factor approximation associated with this sketching technique, while preserving the number of dimensions. For k-means clustering, we show how to achieve a (9 + [epsilon]) approximation by Johnson-Lindenstrauss projecting data points to just 0(log k/[epsilon]2 ) dimensions. This gives the first result that leverages the specific structure of k-means to achieve dimension independent of input size and sublinear in k.
by Elena-Mădălina Persu.
S.M. in Computer Science and Engineering
Xiang, Chongyuan. "Private k-means clustering : algorithms and applications." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/106394.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 77-80).
Today is a new era of big data. We contribute our personal data for the common good simply by using our smart phones, searching the web and doing online transactions. Researchers, companies and governments use the collected data to learn various user behavior patterns and make impactful decisions based on that. Is it possible to publish and run queries on those databases without disclosing information about any specific individual? Differential privacy is a strong notion of privacy which guarantees that very little will be learned about individual records in the database, no matter what the attackers already know or wish to learn. Still, there is no practical system applying differential privacy algorithms for clustering points on real databases. This thesis describes the construction of small coresets for computing k-means clustering of a set of points while preserving differential privacy. As a result, it gives the first 𝑘-means clustering algorithm that is both differentially private, and has an approximation error that depends sub-linearly on the data’s dimension d. Previous results introduced errors that are exponential in d. This thesis implements this algorithm and uses it to create differentially private location data from GPS tracks. Specifically the algorithm allows clustering GPS databases generated from mobile nodes, while letting the user control the introduced noise due to privacy. This thesis also provides experimental results for the system and algorithms, and compares them to existing techniques. To the best of my knowledge, this is the first practical system that enables differentially private clustering on real data.
by Chongyuan Xiang.
M. Eng.
Nelson, Joshua. "On K-Means Clustering Using Mahalanobis Distance." Thesis, North Dakota State University, 2012. https://hdl.handle.net/10365/26766.
Full textLi, Yanjun. "High Performance Text Document Clustering." Wright State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1181005422.
Full textELIASSON, PHILIP, and NIKLAS ROSÉN. "Efficient K-means clustering and the importanceof seeding." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-134910.
Full textKlustring av data innebär att man grupperar dataelement baserat på någon typ a likhet mellan de grupperade elementen. Klustring har många olika användningsråden såsom datakompression, datautvinning, mönsterigenkänning, och maskininlärning och det finns många olika klustringsmetoder. Den här uppsatsen undersöker klustringsmetoden k-means och hur valet av startvärden för metoden påverkar resultatet. Lloyds algorithm används som utgångspunkt och den jämförs med en förbättrad algorithm som använder sig av kd-träd. Två olika metoder att välja startvärden jämförs, slumpmässigt val av startvärde och delklustring.
Kondo, Yumi. "Robustification of the sparse K-means clustering algorithm." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/37093.
Full textChowuraya, Tawanda. "Online content clustering using variant K-Means Algorithms." Thesis, Cape Peninsula University of Technology, 2019. http://hdl.handle.net/20.500.11838/3089.
Full textWe live at a time when so much information is created. Unfortunately, much of the information is redundant. There is a huge amount of online information in the form of news articles that discuss similar stories. The number of articles is projected to grow. The growth makes it difficult for a person to process all that information in order to update themselves on a subject matter. There is an overwhelming amount of similar information on the internet. There is need for a solution that can organize this similar information into specific themes. The solution is a branch of Artificial intelligence (AI) called machine learning (ML) using clustering algorithms. This refers to clustering groups of information that is similar into containers. When the information is clustered people can be presented with information on their subject of interest, grouped together. The information in a group can be further processed into a summary. This research focuses on unsupervised learning. Literature has it that K-Means is one of the most widely used unsupervised clustering algorithm. K-Means is easy to learn, easy to implement and is also efficient. However, there is a horde of variations of K-Means. The research seeks to find a variant of K-Means that can be used with an acceptable performance, to cluster duplicate or similar news articles into correct semantic groups. The research is an experiment. News articles were collected from the internet using gocrawler. gocrawler is a program that takes Universal Resource Locators (URLs) as an argument and collects a story from a website pointed to by the URL. The URLs are read from a repository. The stories come riddled with adverts and images from the web page. This is referred to as a dirty text. The dirty text is sanitized. Sanitization is basically cleaning the collected news articles. This includes removing adverts and images from the web page. The clean text is stored in a repository, it is the input for the algorithm. The other input is the K value. All K-Means based variants take K value that defines the number of clusters to be produced. The stories are manually classified and labelled. The labelling is done to check the accuracy of machine clustering. Each story is labelled with a class to which it belongs. The data collection process itself was not unsupervised but the algorithms used to cluster are totally unsupervised. A total of 45 stories were collected and 9 manual clusters were identified. Under each manual cluster there are sub clusters of stories talking about one specific event. The performance of all the variants is compared to see the one with the best clustering results. Performance was checked by comparing the manual classification and the clustering results from the algorithm. Each K-Means variant is run on the same set of settings and same data set, that is 45 stories. The settings used are, • Dimensionality of the feature vectors, • Window size, • Maximum distance between the current and predicted word in a sentence, • Minimum word frequency, • Specified range of words to ignore, • Number of threads to train the model. • The training algorithm either distributed memory (PV-DM) or distributed bag of words (PV-DBOW), • The initial learning rate. The learning rate decreases to minimum alpha as training progresses, • Number of iterations per cycle, • Final learning rate, • Number of clusters to form, • The number of times the algorithm will be run, • The method used for initialization. The results obtained show that K-Means can perform better than K-Modes. The results are tabulated and presented in graphs in chapter six. Clustering can be improved by incorporating Named Entity (NER) recognition into the K-Means algorithms. Results can also be improved by implementing multi-stage clustering technique. Where initial clustering is done then you take the cluster group and further cluster it to achieve finer clustering results.
Li, Songzi. "K-groups: A Generalization of K-means by Energy Distance." Bowling Green State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1428583805.
Full textXie, Qing Yan. "K-Centers Dynamic Clustering Algorithms and Applications." University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1384427644.
Full textHe, Gaojie. "Authoritative K-Means for Clustering of Web Search Results." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2010. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-11116.
Full textJankovsky, Zachary Kyle. "Clustering Analysis of Nuclear Proliferation Resistance Measures." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398354675.
Full textLeisch, Friedrich. "Bagged clustering." SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, 1999. http://epub.wu.ac.at/1272/1/document.pdf.
Full textSeries: Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
Zhao, Jianmin. "Optimal Clustering: Genetic Constrained K-Means and Linear Programming Algorithms." VCU Scholars Compass, 2006. http://hdl.handle.net/10156/1583.
Full textAl-Guwaizani, Abdulrahman. "Variable neighbourhood search based heuristic for K-harmonic means clustering." Thesis, Brunel University, 2011. http://bura.brunel.ac.uk/handle/2438/5827.
Full textSalman, Raied. "CONTRIBUTIONS TO K-MEANS CLUSTERING AND REGRESSION VIA CLASSIFICATION ALGORITHMS." VCU Scholars Compass, 2012. http://scholarscompass.vcu.edu/etd/2738.
Full textHong, Sui. "Experiments with K-Means, Fuzzy c-Means and Approaches to Choose K and C." Honors in the Major Thesis, University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1224.
Full textBachelors
Engineering and Computer Science
Computer Engineering
Hinz, Joel. "Clustering the Web : Comparing Clustering Methods in Swedish." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-95228.
Full textCALENDER, CHRISTOPHER R. "APPROXIMATE N-NEAREST NEIGHBOR CLUSTERING ON DISTRIBUTED DATABASES USING ITERATIVE REFINEMENT." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092929952.
Full textThirathon, Nattavude 1980. "Cyclic exchange neighborhood search technique for the K-means clustering problem." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/17981.
Full textIncludes bibliographical references (p. 151-152).
Cyclic Exchange is an application of the cyclic transfers neighborhood search technique for the k-means clustering problem. Neighbors of a feasible solution are obtained by moving points between clusters in a cycle. This method attempts to improve local minima obtained by the well-known Lloyd's algorithm. Although the results did not establish usefulness of Cyclic Exchange, our experiments reveal some insights on the k-means clustering and Lloyd's algorithm. While Lloyd's algorithm finds the best local optimum within a thousand iterations for most datasets, it repeatedly finds better local minima after several thousand iterations for some other datasets. For the latter case, Cyclic Exchange also finds better solutions than Lloyd's algorihtm. Although we are unable to identify the features that lead Cyclic Exchange to perform better, our results verify the robustness of Lloyd's algorithm in most datasets.
by Nattavude Thirathon.
M.Eng.
Van, Tilburg Ken. "Identifying boosted objects with N-subjettiness and linear k-means clustering." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/65536.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (p. 57-59).
In this thesis, I explore aspects of a new jet shape - N-subjettiness - designed to identify boosted hadronically-decaying objects (with a particular focus on tagging top quarks) at particle accelerators such as the Large Hadron Collider. Combined with an invariant mass cut on jets, N-subjettiness is a powerful discriminating variable for tagging boosted objects such as top quarks and rejecting the fake background of QCD jets with large invariant mass. In a crossover analysis, the N-subjettiness method is found to outperform the common top tagging methods of the BOOST2010 conference, with top tagging efficiencies of 50% and 20% against mistag rates of 4.0% and 0.19%, respectively. The N-subjettiness values are calculated using a new infrared- and collinear-safe minimization procedure which I call the linear k-means clustering algorithm. As a true jet shape with highly effective tagging performances, N-subjettiness has many advantages on the experimental as well as on the theoretical side.
by Ken Van Tilburg.
S.B.
Soheily-Khah, Saeid. "Generalized k-means-based clustering for temporal data under time warp." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM064/document.
Full textTemporal alignment of multiple time series is an important unresolved problem in many scientific disciplines. Major challenges for an accurate temporal alignment include determining and modeling the common and differential characteristics of classes of time series. This thesis is motivated by recent works in extending Dynamic time warping for aligning multiple time series from several applications including speech recognition, curve matching, micro-array data analysis, temporal segmentation or human motion. However these DTW-based works suffer of several limitations: 1) They address the problem of aligning two time series regardless of the remaining time series, 2) They involve uniformly the features of the multiple time series, 3) The time series are aligned globally by including the whole observations. The aim of this thesis is to explore a generalized dynamic time warping for time series clustering. This work includes first the problem of prototype extraction, then the alignment of multiple and multidimensional time series
Stanforth, Robert William. "Extending K-Means clustering for analysis of quantitative structure activity relationships (QSAR)." Thesis, Birkbeck (University of London), 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.500005.
Full textMalheiros, Larinni. "Detecção de posição e quedas corporais baseado em K-means clustering eThreshold." reponame:Repositório Institucional da UnB, 2017. http://repositorio.unb.br/handle/10482/31978.
Full textSubmitted by Raquel Almeida (raquel.df13@gmail.com) on 2018-05-11T21:36:26Z No. of bitstreams: 1 2017_LarinniMalheiros.pdf: 4570904 bytes, checksum: 1684e0b718246ba537552551bc3e22f3 (MD5)
Approved for entry into archive by Raquel Viana (raquelviana@bce.unb.br) on 2018-05-28T18:08:27Z (GMT) No. of bitstreams: 1 2017_LarinniMalheiros.pdf: 4570904 bytes, checksum: 1684e0b718246ba537552551bc3e22f3 (MD5)
Made available in DSpace on 2018-05-28T18:08:27Z (GMT). No. of bitstreams: 1 2017_LarinniMalheiros.pdf: 4570904 bytes, checksum: 1684e0b718246ba537552551bc3e22f3 (MD5) Previous issue date: 2018-05-28
A queda de idosos é caso de saúde pública em todo o mundo e esse assunto tem sido alvo de pesquisa e desenvolvimento tecnológico com objetivo de amenizar as consequências físicas e psicológicas para estas pessoas e seus familiares. Em 2017, 15,7% dos idosos no Brasil vivem sozinhos, de acordo com [1]. Há várias hipóteses para explicar essa tendência, entre elas, o desejo de autonomia e a dispersão e fragmentação familiar, com muitos filhos morando longe dos pais. Nesse contexto, este trabalho apresenta um dispositivo capaz de auxiliar a monitoração dos idosos em suas atividades, especialmente as domésticas. Serão apresentados os fundamentos teóricos para o desenvolvimento do dispositivo. Os fundamentos teóricos apresentados abordam todas as fases de desenvolvimento do dispositivo, abrangendo desde a instalação da parte física até o desenvolvimento dos algoritmos utilizados para processar as informações. Os desafios encontrado s ao longo desse trabalho foram: precisão e adequação. A precisão do dispositivo é dividida em sensibilidade e especificidade. Ambas são parâmetros utilizados para determinar a acurácia do sistema. O desafio relacionado a essa atividade consistiu em avaliar se a acurácia do dispositivo é suficiente para fornecer a confiabilidade necessária para aplicações de detecção de quedas e posição corporais. Além disso, o dispositivo deve se adequar as características físicas do paciente que o utiliza, pois variáveis como altura, peso e idade influenciado resultado da predição. Será avaliado o desempenho do dispositivo utilizando vários cenários e sua aplicação no mundo real. Será apresentado o comparativo de resultados entre o dispositivo criado neste trabalho de Mestrado ao trabalho de Graduação [2]. Será apresentada uma metodologia baseada em aprendizado de máquina para realizar a predição das posições estáticas (sentado, deitado e em pé) e threshold para determinação de posições dinâmicas (andar e cair). Informações sobre essas posições fornecem resultados se o paciente encontra-se em queda, sendo essa uma posição que deve ser tratada imediatamente pelo cuidador. O algoritmo de aprendizado de máquinas utilizado é o K-Means Clustering, com o qual tem-se a posição estática que está sendo realizada pelo paciente. Uma série de condições de decisão baseadas em thresholds foram utilizadas para detectar posições dinâmicas como andar e cair. Para coletar as informações, será utilizado o sensor MPU6050 e para processamento e apresentação dos dados será utilizado o RaspberryPi. Os dados serão apresentados em uma aplicação Android e Web para monitoramento dos idosos através de seus cuidadores. Como resultado desse trabalho, observou-se que a detecção de quedas e posição corporais utilizando o aprendizado de máquinas para detecção de posições estáticas apresenta resultados confiáveis para a posição deitado e inferioridade estatística para diferenciar os movimentos como sentado e em pé. Em relação aos movimentos dinâmicos, verificou-se que é possível diferenciá-los utilizando parâmetros como regressão linear e área da integral entre o ponto de maior amplitude e o valor remanescente do vetor dos dados obtidos do sensor MPU6050.
Fall Detection is a health issue in all over the world. This matter has been searched and developed in the technology field with the goal of decreased physical and phycological consequences to their families and themselves. There are some hypotheses to explain this trend, among them, the desire for independence and families dispersion and fragmentation, with sons and daughters living away from their parents. In this context, this work presents a device capable of auxiliary and monitors elderly in their activities, especially the domestic activities. This work uses machine learning approach to predict static body position (standing, lying and sitting) and threshold to identify dynamic body position (walking and falling). The machine learning algorithm used in this work to detect static positions is K-Means Clustering. A series of decision conditions based on thresholds to detect dynamic movements such as walking and fall. To collect information will be used MPU6050 and to process and present the data will be used RaspberryPi. As a result of this work, it is possible to conclude that fall and position detection using machine learning to detect static position presents reliable data to lying position and lower static data to differentiate sitting and standing positions. It is possible to differentiate dynamic movements trough linear regression and calculate the integer of the vector obtained from the MPU6050 sensor.
Groth, Gerson Eduardo. "Attribute field K-means : clustering trajectories with attribute by fitting multiple fields." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2016. http://hdl.handle.net/10183/150038.
Full textThe amount of high-dimensional trajectory data and its increasing complexity imposes a challenge when visualizing and analysing this information. Trajectory Visualization must deal with changes both in space and time dimensions, but the attributes of each trajectory may provide insights about its behavior and important aspects. Thus, they should not be neglected. In this work, we tackle this problem by interpreting multivariate time series as attribute-rich trajectories in a configuration space that encodes an explicit relationship among the time series variables. We propose a novel trajectory-clustering technique called Attribute Field k-means (AFKM). It uses a dynamic configuration space to generate clusters based on attributes and parameters set by the user. Furthermore, by incorporating a sketching-based interface, our approach is capable of finding clusters that approximates the input sketches. In addiction, we developed a prototype to explore the trajectories and clusters generated by AFKM in an interactive manner. Our results on synthetic and real time series datasets prove the efficiency and visualization power of our approach.
Dineff, Dimitris. "Clustering using k-means algorithm in multivariate dependent models with factor structure." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-429528.
Full textRamler, Ivan Peter. "Improved statistical methods for k-means clustering of noisy and directional data." [Ames, Iowa : Iowa State University], 2008.
Find full textRanby, Erik. "A comparison of clustering techniques for short social text messages." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-196735.
Full textMängden sociala textmeddelanden som skrivs varje dag är enorm och informationen i dessa kan vara mycket värdefull. Mjukvara som kan klustra och på så sätt analysera dessa meddelanden kan därmed vara användbar. Denna avhandling utforskar flera sätt att klustra sociala textmeddelanden. Två algoritmer och flera konfigureringar med dessa algoritmer har testats och utvärderats med samma indata. Baserat på dessa utvärderingar har en jämförelse utförts för att kunna svara på frågan vilken av dessa konfigureringar som är bäst anpassad för sitt syfte. De två klustringsalgoritmerna som i första hand har jämförts är K-means och agglomerative hierarchical. Alla konfigureringar kördes både med och utan 3-gram som komplement till endast enstaka ord. Utvärderingsmetoderna som användes var intra-cluster distance, inter-cluster distance och silhouette value. Intra-cluster distance är avståndet mellan datapunkterna i samma kluster medan inter-cluster distance är avståndet mellan de olika klustrena. Silhouette value är annan, mer generell, utvärderingsmetod som ofta används för att uppskatta kvaliten på en klustring. Resultaten visade att K-means utan 3-gram är att föredra om kravet på körningstid inte är högst prioriterat. Å andra sidan, om kvaliten på klustringen är viktigare än prestandan på algoritmen, så bör 3-gram användas tillsammans med vilken som av de två algoritmerna.
Mayer-Jochimsen, Morgan. "Clustering Methods and Their Applications to Adolescent Healthcare Data." Scholarship @ Claremont, 2013. http://scholarship.claremont.edu/scripps_theses/297.
Full textNarreddy, Naga Sambu Reddy, and Tuğrul Durgun. "Clusters (k) Identification without Triangle Inequality : A newly modelled theory." Thesis, Uppsala universitet, Institutionen för informatik och media, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-183608.
Full textReanier, Richard Eugene. "Refinements to K-means clustering : spatial analysis of the Bateman site, arctic Alaska /." Thesis, Connect to this title online; UW restricted, 1992. http://hdl.handle.net/1773/6420.
Full textCamara, Assa. "Využití fuzzy množin ve shlukové analýze se zaměřením na metodu Fuzzy C-means Clustering." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2020. http://www.nusl.cz/ntk/nusl-417051.
Full textChahine, Firas Safwan. "A Genetic Algorithm that Exchanges Neighboring Centers for Fuzzy c-Means Clustering." NSUWorks, 2012. http://nsuworks.nova.edu/gscis_etd/116.
Full textKaratzoglou, Alexandros, and Ingo Feinerer. "Text Clustering with String Kernels in R." Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2006. http://epub.wu.ac.at/1002/1/document.pdf.
Full textSeries: Research Report Series / Department of Statistics and Mathematics
Zhou, Dunke. "High-dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1338303646.
Full textKrajčír, Martin. "Internetové souřadnicové systémy." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218186.
Full textDsouza, Jeevan. "Region-based Crossover for Clustering Problems." NSUWorks, 2012. http://nsuworks.nova.edu/gscis_etd/139.
Full textDunkel, Christopher T. "Person detection and tracking using binocular Lucas-Kanade feature tracking and k-means clustering." Connect to this title online, 2008. http://etd.lib.clemson.edu/documents/1219850371/.
Full textLiu, Meng-Ting, and 劉孟庭. "A study of k-means clustering." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/11948168797031986321.
Full text朝陽科技大學
資訊管理系碩士班
97
Clustering is the assignment of a set of observations into subsets (called clusters) so the traits of observations in the same cluster are similar. According to a distance measure or numbers of nearest neighbor points, similarity measurement can be assessed. Clustering is a method of common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. There are two major parts in this thesis. The first part is K-means clustering applied in food images segmentation. The second part is K-means clustering applied in general data. In both part, we modified the K-means clustering to help the image segmentation. In the first part, we demonstrated our method can segment the food image in enough clusters for the food grading process. In the second part, we provided a heuristic approach on K-mean clustering. Initial centers would be chose in our proposed algorithm instead of randomly selection. And then we used the statistic approach to choose the suitable number of clusters. The experimental showed our proposed algorithm can help the clustering process.
Chiu, Chao-Wei, and 邱兆偉. "GPU-Accelerated K-Means Image Clustering." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/02981949068777197122.
Full text國立中興大學
土木工程學系所
102
K-Means clustering has been a widely used approach in unsupervised classification of remotely sensed images. Due to recent emerging development in Graphics Processing Units (GPUs), the computing performance and memory bandwidth of GPUs have been much higher than those of Central Processing Units (CPUs). Therefore, it is expected to accelerate K-Means clustering by parallel computing in GPUs. This research aims on developing a GPU-optimized parallel processing approach for fast unsupervised classification of remotely sensed images using C++ and NVIDIA’s CUDA. The basic idea of traditional K-Means approach was refined with minimum distance classifier in this research for clustering images. The performance of numerical experiments in clustering 3-band color aerial images, in the size of 1360×1020 and scale-down 680×510, into specified number of spectral clusters will be demonstrated for the advantages of 10 to 20 speed-up ratio in computational efficiency of the GPU-based approach in a highly parallel, multi-thread, and multi-core implementation against traditional CPU-based approach.
Ranjan, Sameer. "Hyperspectral Image Classification Using K-means Clustering." Thesis, 2015. http://ethesis.nitrkl.ac.in/7895/1/624.pdf.
Full textZheng, Hao-Wen, and 鄭皓文. "Achieve K-Anonymity with k-means Clustering and Differential Privacy." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/ks2kts.
Full text國立中山大學
資訊工程學系研究所
107
De-identification is a technique for protecting personal privacy on public dataset. It has many methods in the world. All de-identification method achieve de-identified dataset by clustering tuple on dataset, so how to cluster tuple with a better algorithm becomes a important issue. We propose a method, not only clustering algorithm of method clusters tuple on dataset is better than the known method, but also another algorithm of method improves the security of sensitive attribute, it makes the sensitive attribute of our result has better security than the known method.
Tsui, Chen-Kai, and 崔承愷. "Tabular K-means Clustering on Remote Sensing Images." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/60138681590550827372.
Full text國立中興大學
土木工程學系所
105
The capability in computer hardware processing accelerates as the computer technology evolves. However, the quantity of remotely sensed images explores much faster than computer hardware as the sensor technology evolves with much more spatial resolution and spectral resolution. For real-time data handling of remote sensing images, it demands optimization of traditional algorithms in image processing, spectral analysis, and image classification using limited computer computational capability. This study develops a Tabular K-means approach for clustering remotely sensed multispectral images. The proposed approach employs principal component transformation, peak detection on two-dimensional (2-D) scatter diagram of the first two principal components as initial seeds, and Voronoi diagram of these seeds in 2-D spectral space for accelerating unsupervised classification of such images. Experimental results from clustering 7-band Landsat-4 Thematic Mapper (TM) images using Visual C++ programs demonstrate that the proposed Tabular K-means performs much more efficiently than the traditional K-means approach.
Hsiang-Yen, Lin, and 林相延. "Color-Based K-means Clustering for Image Segmentation." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/7sh6mq.
Full textHsueh, Meng-Lun, and 薛孟倫. "Wheeze Detection using Modified k-Means Clustering Algorithm." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/25519734434912722746.
Full text國立臺灣大學
電機工程學研究所
99
The aim of this study is to present wheeze time-frequency characteristics in color scales spectrogram, and k-means clustering algorithm is applied to detect wheezes. k-means clustering algorithms are grouped according to its spectrogram nature. The first step is to preset the k value, representing the grouping number. After experiment testing, the k value is set to three. This number corresponds to the color scale spectrogram are red, green, and blue. Wheeze sounds can also be displayed on the spectrogram. However, k-means clustering algorithms group number is assigned randomly. Therefore, the color corresponding to wheezing symptoms has no fixed color. Through the color-indexing method, the wheezing color is set to be red in accordance with the color index production proportions. In addition, this method is also applied to the normal respiratory sounds, and the effects of noise reduction are discussed. After using modified k-means clustering algorithm, the results show that the signal-to-noise ratios are improved for about 2dB for wheeze and normal cases. The color index can mark wheezing sounds on the color spectrogram in red, and this has a stable representation and reproducibility. This helps the doctor very much in wheeze detection.
Huang, Yi-Fen, and 黃奕棻. "Using K-means Clustering Algorithms for Anomaly Detection." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/50724702843472906729.
Full text國立臺灣海洋大學
資訊工程學系
97
In recent years, the enormous utilization of computer and Internet, “network security” has become an extremely important issue. Consequently, the intrusion detection systems are used extensively among this field. In order to defend our network system, we use firewall and intrusion system to protect it. The effects of traditional detecting methods are getting worse in the face of unknown and various network attacks. Therefore, a lot of new detecting methods are proposed with anomaly detection lately to enhance systematic defense capability. We propose a feature selection method to improve the accuracy and detection rate of intrusion detection system. This method chooses specific features using the information of coefficient correlation. In this paper, we use clustering technique and anomaly detection to build the intrusion detection system. The experiments are performed to evaluate detection, classification and false alarm rates. According to the results of our experiments, it proves our new proposed method is better than other traditional intrusion detection approaches.
Su, Ruey-Chyi, and 蘇睿頎. "Sperm Whale's Voiceprint Classification via K-means Clustering." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/11187908401356738201.
Full text國立臺灣海洋大學
電機工程學系
104
Sperm whale is the world's largest species of odontoceti mammals, has highly developed social system, practices different pulse sequence combinations to perform social communication. In this thesis, we focus on two voiceprints, Clicks and Codas, of sperm whale to perform their feature analysis and classification. The source audio files are retrieved from CIBRA (Centor Interdisciplinare di Bioacusticae Ricerche Ambientali) and TCOML (The Cornel Lab of Ornithology Macaulay Library), in which background noises are included except for the bottlenose dolphin’s voices. We use spectral subtraction method to eliminate background noise, construct spectrograms of these two voiceprints via time-frequency analysis, extract the maximum, minimum and average values of inter-click intervals (ICIs) as features, and perform classification by k-means clustering method. Feature analysis results show that both Clicks and Codas consist of pulse sequences, but their patterns as well as the averaged values of ICIs are quite different. Clustering analysis shows that 100% of correct classification can be achieved.
Yang, Hong-Xiang, and 楊閎翔. "A Modified K-means Algorithm for Sequence Clustering." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/22490180184996174354.
Full text輔仁大學
資訊工程學系
95
Our research interest has been focused on “content-based retrieval and feature extraction for multimedia database.” In this paper, we would like to extend our research to construct a system which provides a clustering services, more than user-active search. We use DCT Mapping to feature extraction and our method includes two case which are equal-length and variable-length. In equal-length, we map a sequences to a f-dimensional point in feature space, and then clustering for sequence according to whole similarity. Our methods are apply hierarchical clustering (signal-linkage, average-linkage, complete-linkage) and partitional clustering (K-means). In variable-length, we cut sequence into subsequence by sliding window and map them to f-dimensional points. We proposed a Modified K-means (MK) algorithm to clustering for sequence according to partial similarity. We also apply Minimum Bounding Rectangle (MBR) proposed a MBR Modified K-means (MMK) algorithm to performance more efficiently for MK algorithm. Finally, we implemented our method and carried out experiments.
NEGI, ROHIT. "K-MEANS CLUSTERING ALGORITHM ON MAP REDUCE ARCHITECTURE." Thesis, 2016. http://dspace.dtu.ac.in:8080/jspui/handle/repository/14749.
Full text