Thèses sur le sujet « Clustering analysi »

Pour voir les autres types de publications sur ce sujet consultez le lien suivant : Clustering analysi.

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 50 meilleures thèses pour votre recherche sur le sujet « Clustering analysi ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les thèses sur diverses disciplines et organisez correctement votre bibliographie.

1

Zreik, Rawya. « Analyse statistique des réseaux et applications aux sciences humaines ». Thesis, Paris 1, 2016. http://www.theses.fr/2016PA01E061/document.

Texte intégral
Résumé :
Depuis les travaux précurseurs de Moreno (1934), l’analyse des réseaux est devenue une discipline forte, qui ne se limite plus à la sociologie et qui est à présent appliquée à des domaines très variés tels que la biologie, la géographie ou l’histoire. L’intérêt croissant pour l’analyse des réseaux s’explique d’une part par la forte présence de ce type de données dans le monde numérique d’aujourd’hui et, d’autre part, par les progrès récents dans la modélisation et le traitement de ces données. En effet, informaticiens et statisticiens ont porté leurs efforts depuis plus d’une dizaine d’années sur ces données de type réseau en proposant des nombreuses techniques permettant leur analyse. Parmi ces techniques on note les méthodes de clustering qui permettent en particulier de découvrir une structure en groupes cachés dans le réseau. De nombreux facteurs peuvent exercer une influence sur la structure d’un réseau ou rendre les analyses plus faciles à comprendre. Parmi ceux-ci, on trouve deux facteurs importants: le facteur du temps, et le contexte du réseau. Le premier implique l’évolution des connexions entre les nœuds au cours du temps. Le contexte du réseau peut alors être caractérisé par différents types d’informations, par exemple des messages texte (courrier électronique, tweets, Facebook, messages, etc.) échangés entre des nœuds, des informations catégoriques sur les nœuds (âge, sexe, passe-temps, Les fréquences d’interaction (par exemple, le nombre de courriels envoyés ou les commentaires affichés), et ainsi de suite. La prise en considération de ces facteurs nous permet de capturer de plus en plus d’informations complexes et cachées à partir des données. L’objectif de ma thèse été de définir des nouveaux modèles de graphes aléatoires qui prennent en compte les deux facteurs mentionnés ci-dessus, afin de développer l’analyse de la structure du réseau et permettre l’extraction de l’information cachée à partir des données. Ces modèles visent à regrouper les sommets d’un réseau en fonction de leurs profils de connexion et structures de réseau, qui sont statiques ou évoluant dynamiquement au cours du temps. Le point de départ de ces travaux est le modèle de bloc stochastique (SBM). Il s’agit d’un modèle de mélange pour les graphiques qui ont été initialement développés en sciences sociales. Il suppose que les sommets d’un réseau sont répartis sur différentes classes, de sorte que la probabilité d’une arête entre deux sommets ne dépend que des classes auxquelles ils appartiennent
Over the last two decades, network structure analysis has experienced rapid growth with its construction and its intervention in many fields, such as: communication networks, financial transaction networks, gene regulatory networks, disease transmission networks, mobile telephone networks. Social networks are now commonly used to represent the interactions between groups of people; for instance, ourselves, our professional colleagues, our friends and family, are often part of online networks, such as Facebook, Twitter, email. In a network, many factors can exert influence or make analyses easier to understand. Among these, we find two important ones: the time factor, and the network context. The former involves the evolution of connections between nodes over time. The network context can then be characterized by different types of information such as text messages (email, tweets, Facebook, posts, etc.) exchanged between nodes, categorical information on the nodes (age, gender, hobbies, status, etc.), interaction frequencies (e.g., number of emails sent or comments posted), and so on. Taking into consideration these factors can lead to the capture of increasingly complex and hidden information from the data. The aim of this thesis is to define new models for graphs which take into consideration the two factors mentioned above, in order to develop the analysis of network structure and allow extraction of the hidden information from the data. These models aim at clustering the vertices of a network depending on their connection profiles and network structures, which are either static or dynamically evolving. The starting point of this work is the stochastic block model, or SBM. This is a mixture model for graphs which was originally developed in social sciences. It assumes that the vertices of a network are spread over different classes, so that the probability of an edge between two vertices only depends on the classes they belong to
Styles APA, Harvard, Vancouver, ISO, etc.
2

Karim, Ehsanul, Sri Phani Venkata Siva Krishna Madani et Feng Yun. « Fuzzy Clustering Analysis ». Thesis, Blekinge Tekniska Högskola, Sektionen för ingenjörsvetenskap, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2165.

Texte intégral
Résumé :
The Objective of this thesis is to talk about the usage of Fuzzy Logic in pattern recognition. There are different fuzzy approaches to recognize the pattern and the structure in data. The fuzzy approach that we choose to process the data is completely depends on the type of data. Pattern reorganization as we know involves various mathematical transforms so as to render the pattern or structure with the desired properties such as the identification of a probabilistic model which provides the explaination of the process generating the data clarity seen and so on and so forth. With this basic school of thought we plunge into the world of Fuzzy Logic for the process of pattern recognition. Fuzzy Logic like any other mathematical field has its own set of principles, types, representations, usage so on and so forth. Hence our job primarily would focus to venture the ways in which Fuzzy Logic is applied to pattern recognition and knowledge of the results. That is what will be said in topics to follow. Pattern recognition is the collection of all approaches that understand, represent and process the data as segments and features by using fuzzy sets. The representation and processing depend on the selected fuzzy technique and on the problem to be solved. In the broadest sense, pattern recognition is any form of information processing for which both the input and output are different kind of data, medical records, aerial photos, market trends, library catalogs, galactic positions, fingerprints, psychological profiles, cash flows, chemical constituents, demographic features, stock options, military decisions.. Most pattern recognition techniques involve treating the data as a variable and applying standard processing techniques to it.
Styles APA, Harvard, Vancouver, ISO, etc.
3

Al-Razgan, Muna Saleh. « Weighted clustering ensembles ». Fairfax, VA : George Mason University, 2008. http://hdl.handle.net/1920/3212.

Texte intégral
Résumé :
Thesis (Ph.D.)--George Mason University, 2008.
Vita: p. 134. Thesis director: Carlotta Domeniconi. Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Information Technology. Title from PDF t.p. (viewed Oct. 14, 2008). Includes bibliographical references (p. 128-133). Also issued in print.
Styles APA, Harvard, Vancouver, ISO, etc.
4

Leisch, Friedrich. « Bagged clustering ». SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, 1999. http://epub.wu.ac.at/1272/1/document.pdf.

Texte intégral
Résumé :
A new ensemble method for cluster analysis is introduced, which can be interpreted in two different ways: As complexity-reducing preprocessing stage for hierarchical clustering and as combination procedure for several partitioning results. The basic idea is to locate and combine structurally stable cluster centers and/or prototypes. Random effects of the training set are reduced by repeatedly training on resampled sets (bootstrap samples). We discuss the algorithm both from a more theoretical and an applied point of view and demonstrate it on several data sets. (author's abstract)
Series: Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
Styles APA, Harvard, Vancouver, ISO, etc.
5

Gupta, Pramod. « Robust clustering algorithms ». Thesis, Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/39553.

Texte intégral
Résumé :
One of the most widely used techniques for data clustering is agglomerative clustering. Such algorithms have been long used across any different fields ranging from computational biology to social sciences to computer vision in part because they are simple and their output is easy to interpret. However, many of these algorithms lack any performance guarantees when the data is noisy, incomplete or has outliers, which is the case for most real world data. It is well known that standard linkage algorithms perform extremely poorly in presence of noise. In this work we propose two new robust algorithms for bottom-up agglomerative clustering and give formal theoretical guarantees for their robustness. We show that our algorithms can be used to cluster accurately in cases where the data satisfies a number of natural properties and where the traditional agglomerative algorithms fail. We also extend our algorithms to an inductive setting with similar guarantees, in which we randomly choose a small subset of points from a much larger instance space and generate a hierarchy over this sample and then insert the rest of the points to it to generate a hierarchy over the entire instance space. We then do a systematic experimental analysis of various linkage algorithms and compare their performance on a variety of real world data sets and show that our algorithms do much better at handling various forms of noise as compared to other hierarchical algorithms in the presence of noise.
Styles APA, Harvard, Vancouver, ISO, etc.
6

Xu, Tianbing. « Nonparametric evolutionary clustering ». Diss., Online access via UMI:, 2009.

Trouver le texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
7

Shortreed, Susan. « Learning in spectral clustering / ». Thesis, Connect to this title online ; UW restricted, 2006. http://hdl.handle.net/1773/8977.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
8

Ptitsyn, Andrey. « New algorithms for EST clustering ». Thesis, University of the Western Cape, 2000. http://etd.uwc.ac.za/index.php?module=etd&amp.

Texte intégral
Résumé :
Expressed sequence tag database is a rich and fast growing source of data for gene expression analysis and drug discovery. Clustering of raw EST data is a necessary step for further analysis and one of the most challenging problems of modem computational biology.
Styles APA, Harvard, Vancouver, ISO, etc.
9

Karimi, Kambiz. « Clustering analysis of residential loads ». Kansas State University, 2016. http://hdl.handle.net/2097/32616.

Texte intégral
Résumé :
Master of Science
Department of Electrical and Computer Engineering
Anil Pahwa
Understanding electricity consumer behavior at different times of the year and throughout the day is very import for utilities. Though electricity consumers pay a fixed predetermined amount of money for using electric energy, the market wholesale prices vary hourly during the day. This analysis is intended to see overall behavior of consumers in different seasons of the year and compare them with the market wholesale prices. Specifically, coincidence of peaks in the loads with peak of market wholesale price is analyzed. This analysis used data from 101 homes in Austin, TX, which are gathered and stored by Pecan Street Inc. These data were used to first determine the average seasonal load profiles of all houses. Secondly, the houses were categorized into three clusters based on similarities in the load profiles using k-means clustering method. Finally, the average seasonal profiles of each cluster with the wholesale market prices which was taken from Electric Reliability Council of Texas (ERCOT) were compared. The data obtained for the houses were in 15-min intervals so they were first changed to average hourly profiles. All the data were then used to determine average seasonal profiles for each house in each season (winter, spring, summer and fall). We decided to set three levels of clusters). All houses were then categorized into one of these three clusters using k-means clustering. Similarly electricity prices taken from ERCOT, which were also on 15-min basis, were changed to hourly averages and then to seasonal averages. Through clustering analysis we found that a low percent of the consumers did not change their pattern of electricity usage while the majority of the users changed their electricity usage pattern once from one season to another. This change in usage patterns mostly depends on level of income, type of heating and cooling systems used, and other electric appliances used. Comparing the ERCOT prices with the average seasonal electricity profiles of each cluster we found that winter and spring seasons are critical for utilities and the ERCOT price peaks in the morning while the peak loads occur in the evening. In summer and fall, on the other hand, ERCOT price and load demand peak at almost the same time with one or two hour difference. This analysis can help utilities and other authorities make better electricity usage policies so they could shift some of the load from the time of peak to other times.
Styles APA, Harvard, Vancouver, ISO, etc.
10

FARMANI, MOHAMMAD REZA. « Clustering analysis using Swarm Intelligence ». Doctoral thesis, Università degli Studi di Cagliari, 2016. http://hdl.handle.net/11584/266871.

Texte intégral
Résumé :
This thesis is concerned with the application of the swarm intelligence methods in clustering analysis of datasets. The main objectives of the thesis are ∙ Take the advantage of a novel evolutionary algorithm, called artificial bee colony, to improve the capability of K-means in finding global optimum clusters in nonlinear partitional clustering problems. ∙ Consider partitional clustering as an optimization problem and an improved antbased algorithm, named Opposition-Based API (after the name of Pachycondyla APIcalis ants), to automatic grouping of large unlabeled datasets. ∙ Define partitional clustering as a multiobjective optimization problem. The aim is to obtain well-separated, connected, and compact clusters and for this purpose, two objective functions have been defined based on the concepts of data connectivity and cohesion. These functions are the core of an efficient multiobjective particle swarm optimization algorithm, which has been devised for and applied to automatic grouping of large unlabeled datasets. For that purpose, this thesis is divided is five main parts: ∙ The first part, including Chapter 1, aims at introducing state of the art of swarm intelligence based clustering methods. ∙ The second part, including Chapter 2, consists in clustering analysis with combination of artificial bee colony algorithm and K-means technique. ∙ The third part, including Chapter 3, consists in a presentation of clustering analysis using opposition-based API algorithm. ∙ The fourth part, including Chapter 4, consists in multiobjective clustering analysis using particle swarm optimization. ∙ Finally, the fifth part, including Chapter 5, concludes the thesis and addresses the future directions and the open issues of this research.
Styles APA, Harvard, Vancouver, ISO, etc.
11

Cole, Rowena Marie. « Clustering with genetic algorithms ». University of Western Australia. Dept. of Computer Science, 1998. http://theses.library.uwa.edu.au/adt-WU2003.0008.

Texte intégral
Résumé :
Clustering is the search for those partitions that reflect the structure of an object set. Traditional clustering algorithms search only a small sub-set of all possible clusterings (the solution space) and consequently, there is no guarantee that the solution found will be optimal. We report here on the application of Genetic Algorithms (GAs) -- stochastic search algorithms touted as effective search methods for large and complex spaces -- to the problem of clustering. GAs which have been made applicable to the problem of clustering (by adapting the representation, fitness function, and developing suitable evolutionary operators) are known as Genetic Clustering Algorithms (GCAs). There are two parts to our investigation of GCAs: first we look at clustering into a given number of clusters. The performance of GCAs on three generated data sets, analysed using 4320 differing combinations of adaptions, establishes their efficacy. Choice of adaptions and parameter settings is data set dependent, but comparison between results using generated and real data sets indicate that performance is consistent for similar data sets with the same number of objects, clusters, attributes, and a similar distribution of objects. Generally, group-number representations are better suited to the clustering problem, as are dynamic scaling, elite selection and high mutation rates. Independent generalised models fitted to the correctness and timing results for each of the generated data sets produced accurate predictions of the performance of GCAs on similar real data sets. While GCAs can be successfully adapted to clustering, and the method produces results as accurate and correct as traditional methods, our findings indicate that, given a criterion based on simple distance metrics, GCAs provide no advantages over traditional methods. Second, we investigate the potential of genetic algorithms for the more general clustering problem, where the number of clusters is unknown. We show that only simple modifications to the adapted GCAs are needed. We have developed a merging operator, which with elite selection, is employed to evolve an initial population with a large number of clusters toward better clusterings. With regards to accuracy and correctness, these GCAs are more successful than optimisation methods such as simulated annealing. However, such GCAs can become trapped in local minima in the same manner as traditional hierarchical methods. Such trapping is characterised by the situation where good (k-1)-clusterings do not result from our merge operator acting on good k-clusterings. A marked improvement in the algorithm is observed with the addition of a local heuristic.
Styles APA, Harvard, Vancouver, ISO, etc.
12

Zhang, Yiqun. « Advances in categorical data clustering ». HKBU Institutional Repository, 2019. https://repository.hkbu.edu.hk/etd_oa/658.

Texte intégral
Résumé :
Categorical data are common in various research areas, and clustering is a prevalent technique used for analyse them. However, two challenging problems are encountered in categorical data clustering analysis. The first is that most categorical data distance metrics were actually proposed for nominal data (i.e., a categorical data set that comprises only nominal attributes), ignoring the fact that ordinal attributes are also common in various categorical data sets. As a result, these nominal data distance metrics cannot account for the order information of ordinal attributes and may thus inappropriately measure the distances for ordinal data (i.e., a categorical data set that comprises only ordinal attributes) and mixed categorical data (i.e., a categorical data set that comprises both ordinal and nominal attributes). The second problem is that most hierarchical clustering approaches were actually designed for numerical data and have very high computation costs; that is, with time complexity O(N2) for a data set with N data objects. These issues have presented huge obstacles to the clustering analysis of categorical data. To address the ordinal data distance measurement problem, we studied the characteristics of ordered possible values (also called 'categories' interchangeably in this thesis) of ordinal attributes and propose a novel ordinal data distance metric, which we call the Entropy-Based Distance Metric (EBDM), to quantify the distances between ordinal categories. The EBDM adopts cumulative entropy as a measure to indicate the amount of information in the ordinal categories and simulates the thinking process of changing one's mind between two ordered choices to quantify the distances according to the amount of information in the ordinal categories. The order relationship and the statistical information of the ordinal categories are both considered by the EBDM for more appropriate distance measurement. Experimental results illustrate the superiority of the proposed EBDM in ordinal data clustering. In addition to designing an ordinal data distance metric, we further propose a unified categorical data distance metric that is suitable for distance measurement of all three types of categorical data (i.e., ordinal data, nominal data, and mixed categorical data). The extended version uniformly defines distances and attribute weights for both ordinal and nominal attributes, by which the distances measured for the two types of attributes of a mixed categorical data can be directly combined to obtain the overall distances between data objects with no information loss. Extensive experiments on all three types of categorical data sets demonstrate the effectiveness of the unified distance metric in clustering analysis of categorical data. To address the hierarchical clustering problem of large-scale categorical data, we propose a fast hierarchical clustering framework called the Growing Multi-layer Topology Training (GMTT). The most significant merit of this framework is its ability to reduce the time complexity of most existing hierarchical clustering frameworks (i.e., O(N2)) to O(N1.5) without sacrificing the quality (i.e., clustering accuracy and hierarchical details) of the constructed hierarchy. According to our design, the GMTT framework is applicable to categorical data clustering simply by adopting a categorical data distance metric. To make the GMTT framework suitable for the processing of streaming categorical data, we also provide an incremental version of GMTT that can dynamically adopt new inputs into the hierarchy via local updating. Theoretical analysis proves that the GMTT frameworks have time complexity O(N1.5). Extensive experiments show the efficacy of the GMTT frameworks and demonstrate that they achieve more competitive categorical data clustering performance by adopting the proposed unified distance metric.
Styles APA, Harvard, Vancouver, ISO, etc.
13

Chan, Alton Kam Fai. « Hyperplane based efficient clustering and searching / ». View abstract or full-text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20CHANA.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
14

Razafindramanana, Octavio. « Low-dimensional data analysis and clustering by means of Delaunay triangulation ». Thesis, Tours, 2014. http://www.theses.fr/2014TOUR4033/document.

Texte intégral
Résumé :
Les travaux présentés et discutés dans cette thèse ont pour objectif de proposer plusieurs solutions au problème de l’analyse et du clustering de nuages de points en basse dimension. Ces solutions s’appuyent sur l’analyse de triangulations de Delaunay. Deux types d’approches sont présentés et discutés. Le premier type suit une approche en trois-passes classique: 1) la construction d’un graphe de proximité contenant une information topologique, 2) la construction d’une information statistique à partir de ce graphe et 3) la suppression d’éléments inutiles au regard de cette information statistique. L’impact de différentes measures sur le clustering ainsi que sur la reconnaissance de caractères est discuté. Ces mesures s’appuyent sur l’exploitation du complexe simplicial et non pas uniquement sur celle du graphe. Le second type d’approches est composé d’approches en une passe extrayant des clusters en même temps qu’une triangulation de Delaunay est construite
This thesis aims at proposing and discussing several solutions to the problem of low-dimensional point cloudanalysis and clustering. These solutions are based on the analysis of the Delaunay triangulation.Two types of approaches are presented and discussed. The first one follows a classical three steps approach:1) the construction of a proximity graph that embeds topological information, 2) the construction of statisticalinformation out of this graph and 3) the removal of pointless elements regarding this information. The impactof different simplicial complex-based measures, i.e. not only based on a graph, is discussed. Evaluation is madeas regards point cloud clustering quality along with handwritten character recognition rates. The second type ofapproaches consists of one-step approaches that derive clustering along with the construction of the triangulation
Styles APA, Harvard, Vancouver, ISO, etc.
15

Cui, Yingjie. « A study on privacy-preserving clustering ». Click to view the E-thesis via HKUTO, 2009. http://sunzi.lib.hku.hk/hkuto/record/B4357225X.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
16

Kübler, Bernhard Christian. « Risk classification by means of clustering ». Frankfurt, M. Berlin Bern Bruxelles New York, NY Oxford Wien Lang, 2009. http://d-nb.info/998737291/04.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
17

Chang, Soong Uk. « Clustering with mixed variables / ». [St. Lucia, Qld.], 2005. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe19086.pdf.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
18

Zhou, Dunke. « High-dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products ». The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1338303646.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
19

Lee, King-for Foris. « Clustering uncertain data using Voronoi diagram ». Click to view the E-thesis via HKUTO, 2009. http://sunzi.lib.hku.hk/hkuto/record/B43224131.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
20

Albarakati, Rayan. « Density Based Data Clustering ». CSUSB ScholarWorks, 2015. https://scholarworks.lib.csusb.edu/etd/134.

Texte intégral
Résumé :
Data clustering is a data analysis technique that groups data based on a measure of similarity. When data is well clustered the similarities between the objects in the same group are high, while the similarities between objects in different groups are low. The data clustering technique is widely applied in a variety of areas such as bioinformatics, image segmentation and market research. This project conducted an in-depth study on data clustering with focus on density-based clustering methods. The latest density-based (CFSFDP) algorithm is based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively larger distance from points with higher densities. This method has been examined, experimented, and improved. These methods (KNN-based, Gaussian Kernel-based and Iterative Gaussian Kernel-based) are applied in this project to improve (CFSFDP) density-based clustering. The methods are applied to four milestone datasets and the results are analyzed and compared.
Styles APA, Harvard, Vancouver, ISO, etc.
21

McClelland, Robyn L. « Regression based variable clustering for data reduction / ». Thesis, Connect to this title online ; UW restricted, 2000. http://hdl.handle.net/1773/9611.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
22

Madureira, Erikson Manuel Geraldo Vieira de. « Análise de mercado : clustering ». Master's thesis, Instituto Superior de Economia e Gestão, 2016. http://hdl.handle.net/10400.5/13122.

Texte intégral
Résumé :
Mestrado em Decisão Económica e Empresarial
O presente trabalho tem como objetivo descrever as atividades realizadas durante o estágio efetuado na empresa Quidgest. Tendo a empresa a necessidade de estudar as suas diversas vertentes de negócio, optou-se por extrair e identificar as informações presentes no banco de dados da empresa. Para isso, foi utilizado um processo conhecido na análise de dados denominado por Extração de Conhecimento em Bases de Dados (ECBD). O maior desafio na utilização deste processo deveu-se há grande acumulação de informação pela empresa, que se foi intensificando a partir de 2013. Das fases do processo de ECBD, a que tem maior relevância é o data mining, onde é feito um estudo das variáveis caracterizadoras necessárias para a análise em foco. Foi escolhida a técnica de análise cluster da fase de data mining para que que toda análise possa ser eficiente, eficaz e se possa obter resultados de fácil leitura. Após o desenvolvimento do processo de ECBD, foi decidido que a fase de data mining podia ser implementada de modo a facilitar um trabalho futuro de uma análise realizada pela empresa. Para implementar essa fase, utilizaram-se técnicas de análise cluster e foi desenvolvida um programa em VBA/Excel centrada no utilizador. Para testar o programa criado foi utilizado um caso concreto da empresa. Esse caso consistiu em determinar quais os atuais clientes que mais contribuíram para a evolução da empresa nos anos de 2013 a 2015. Aplicando o caso referido no programa criado, obtiveram-se resultados e informações que foram analisadas e interpretadas.
This paper aims to describe the activities performed during the internship made in Quidgest company. Having the company need to study their various business areas, it was decided to extract and identify the information contained in the company's database. For this end, we used a process known in the data analysis called for Knowledge Discovery in Databases (KDD). The biggest challenge in using this process was due to their large accumulation of information by the company, which was intensified from 2013. The phases of the KDD process, which is the most relevant is data mining, where a study of characterizing variables required for the analysis is done. The cluster analysis technique of data mining phase was chosen for that any analysis can be efficient, effective and could provide results easy to read. After the development of the KDD process, it was decided that the data mining phase could be automated to facilitate future work carried out by the company. To automate this phase, cluster analysis techniques were used and was developed a program in VBA/Excel user-centered. To test the created program we used a specific case of the company. This case consisted in determining the current customers that have contributed to the company's evolution during the years 2013-2015. The application of the program has revealed useful information that has been analyzed and interpreted.
info:eu-repo/semantics/publishedVersion
Styles APA, Harvard, Vancouver, ISO, etc.
23

Vohra, Neeru Rani. « Three dimensional statistical graphs, visual cues and clustering ». Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ56213.pdf.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
24

Cui, Yingjie, et 崔英杰. « A study on privacy-preserving clustering ». Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B4357225X.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
25

Xiong, Yimin. « Time series clustering using ARMA models / ». View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?COMP%202004%20XIONG.

Texte intégral
Résumé :
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 49-55). Also available in electronic version. Access restricted to campus users.
Styles APA, Harvard, Vancouver, ISO, etc.
26

Tantrum, Jeremy. « Model based and hybrid clustering of large datasets / ». Thesis, Connect to this title online ; UW restricted, 2003. http://hdl.handle.net/1773/8933.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
27

Zhou, Hong. « Visual clustering in parallel coordinates and graphs / ». View abstract or full-text, 2009. http://library.ust.hk/cgi/db/thesis.pl?CSED%202009%20ZHOU.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
28

梁德貞 et Tak-ching Leung. « Correspondence analysis and clustering with applications to site-species occurrence ». Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1991. http://hub.hku.hk/bib/B31209889.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
29

Leung, Tak-ching. « Correspondence analysis and clustering with applications to site-species occurrence / ». [Hong Kong] : University of Hong Kong, 1991. http://sunzi.lib.hku.hk/hkuto/record.jsp?B13039519.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
30

Lee, King-for Foris, et 李敬科. « Clustering uncertain data using Voronoi diagram ». Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B43224131.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
31

Szeto, Lap Keung. « Clustering analysis of microarray gene expression data / ». access full-text access abstract and table of contents, 2005. http://libweb.cityu.edu.hk/cgi-bin/ezdb/thesis.pl?mphil-it-b19885817a.pdf.

Texte intégral
Résumé :
Thesis (M.Phil.)--City University of Hong Kong, 2005.
"Submitted to Department of Computer Engineering and Information Technology in partial fulfillment of the requirements for the degree of Master of Philosophy" Includes bibliographical references (leaves 70-79)
Styles APA, Harvard, Vancouver, ISO, etc.
32

Ray, Shubhankar. « Nonparametric Bayesian analysis of some clustering problems ». Texas A&M University, 2006. http://hdl.handle.net/1969.1/4251.

Texte intégral
Résumé :
Nonparametric Bayesian models have been researched extensively in the past 10 years following the work of Escobar and West (1995) on sampling schemes for Dirichlet processes. The infinite mixture representation of the Dirichlet process makes it useful for clustering problems where the number of clusters is unknown. We develop nonparametric Bayesian models for two different clustering problems, namely functional and graphical clustering. We propose a nonparametric Bayes wavelet model for clustering of functional or longitudinal data. The wavelet modelling is aimed at the resolution of global and local features during clustering. The model also allows the elicitation of prior belief about the regularity of the functions and has the ability to adapt to a wide range of functional regularity. Posterior inference is carried out by Gibbs sampling with conjugate priors for fast computation. We use simulated as well as real datasets to illustrate the suitability of the approach over other alternatives. The functional clustering model is extended to analyze splice microarray data. New microarray technologies probe consecutive segments along genes to observe alternative splicing (AS) mechanisms that produce multiple proteins from a single gene. Clues regarding the number of splice forms can be obtained by clustering the functional expression profiles from different tissues. The analysis was carried out on the Rosetta dataset (Johnson et al., 2003) to obtain a splice variant by tissue distribution for all the 10,000 genes. We were able to identify a number of splice forms that appear to be unique to cancer. We propose a Bayesian model for partitioning graphs depicting dependencies in a collection of objects. After suitable transformations and modelling techniques, the problem of graph cutting can be approached by nonparametric Bayes clustering. We draw motivation from a recent work (Dhillon, 2001) showing the equivalence of kernel k-means clustering and certain graph cutting algorithms. It is shown that loss functions similar to the kernel k-means naturally arise in this model, and the minimization of associated posterior risk comprises an effective graph cutting strategy. We present here results from the analysis of two microarray datasets, namely the melanoma dataset (Bittner et al., 2000) and the sarcoma dataset (Nykter et al., 2006).
Styles APA, Harvard, Vancouver, ISO, etc.
33

Weijermars, Wilhelmina Adriana Maria. « Analysis of urban traffic patterns using clustering ». Enschede : University of Twente [Host], 2007. http://doc.utwente.nl/57837.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
34

Bennett, Brian Todd. « Locating Potential Aspect Interference Using Clustering Analysis ». NSUWorks, 2015. http://nsuworks.nova.edu/gscis_etd/50.

Texte intégral
Résumé :
Software design continues to evolve from the structured programming paradigm of the 1970s and 1980s and the object-oriented programming (OOP) paradigm of the 1980s and 1990s. The functional decomposition design methodology used in these paradigms reduced the prominence of non-functional requirements, which resulted in scattered and tangled code to address non-functional elements. Aspect-oriented programming (AOP) allowed the removal of crosscutting concerns scattered throughout class code into single modules known as aspects. Aspectization resulted in increased modularity in class code, but introduced new types of problems that did not exist in OOP. One such problem was aspect interference, in which aspects meddled with the data flow or control flow of a program. Research has developed various solutions for detecting and addressing aspect interference using formal design and specification methods, and by programming techniques that specify aspect precedence. Such explicit specifications required practitioners to have a complete understanding of possible aspect interference in an AOP system under development. However, as system size increased, understanding of possible aspect interference could decrease. Therefore, practitioners needed a way to increase their understanding of possible aspect interference within a program. This study used clustering analysis to locate potential aspect interference within an aspect-oriented program under development, using k-means partitional clustering. Vector space models, using two newly defined metrics, interference potential (IP) and interference causality potential (ICP), and an existing metric, coupling on advice execution (CAE), provided input to the clustering algorithms. Resulting clusters were analyzed via an internal strategy using the R-Squared, Dunn, Davies-Bouldin, and SD indexes. The process was evaluated on both a smaller scale AOP system (AspectTetris), and a larger scale AOP system (AJHotDraw). By seeding potential interference problems into these programs and comparing results using visualizations, this study found that clustering analysis provided a viable way for detecting interference problems in aspect-oriented software. The ICP model was best at detecting interference problems, while the IP model produced results that were more sporadic. The CAE clustering models were not effective in pinpointing potential aspect interference problems. This was the first known study to use clustering analysis techniques specifically for locating aspect interference.
Styles APA, Harvard, Vancouver, ISO, etc.
35

Petrov, Anton Igorevich. « RNA 3D Motifs : Identification, Clustering, and Analysis ». Bowling Green State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1333929629.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
36

Talasu, Dharneesh. « Efficient fMRI Analysis and Clustering on GPUs ». The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1322077186.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
37

Jankovsky, Zachary Kyle. « Clustering Analysis of Nuclear Proliferation Resistance Measures ». The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398354675.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
38

Bhusal, Prem. « Scalable Clustering for Immune Repertoire Sequence Analysis ». Wright State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=wright1558631347622374.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
39

Yuan, Ding. « Heuristic subset clustering for consideration set analysis ». Diss., University of Iowa, 2007. http://ir.uiowa.edu/etd/137.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
40

Speer, Nora. « Funktionelles Clustering von Genen mit der Gene Ontology / ». Berlin : Logos-Verl, 2006. http://deposit.d-nb.de/cgi-bin/dokserv?id=2875270&prov=M&dok_var=1&dok_ext=htm.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
41

Woo, Kam Tim. « Applications of clustering techniques on communication systems / ». View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20WOO.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
42

Hossain, Mahmud Shahriar. « Exploratory Data Analysis using Clusters and Stories ». Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28085.

Texte intégral
Résumé :
Exploratory data analysis aims to study datasets through the use of iterative, investigative, and visual analytic algorithms. Due to the difficulty in managing and accessing the growing volume of unstructured data, exploratory analysis of datasets has become harder than ever and an interest to data mining researchers. In this dissertation, we study new algorithms for exploratory analysis of data collections using clusters and stories. Clustering brings together similar entities whereas stories connect dissimilar objects. The former helps organize datasets into regions of interest, and the latter explores latent information by connecting the dots between disjoint instances. This dissertation specifically focuses on five different research aspects to demonstrate the applicability and usefulness of clusters and stories as exploratory data analysis tools. In the area of clustering, we investigate whether clustering algorithms can be automatically "alternatized" and how they can be guided to obtain alternative results using flexible constraints as "scatter-gather" operations. We demonstrate the application of these ideas in many application domains, including studying the bat biosonar system and designing sustainable products. In the area of storytelling, we develop algorithms that can generate stories using distance, clique, and syntactic constraints. We explore the use of storytelling for studying document collections in the biomedical literature and intelligence analysis domain.
Ph. D.
Styles APA, Harvard, Vancouver, ISO, etc.
43

Eriksson, Håkan. « Clustering Generic Log Files Under Limited Data Assumptions ». Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189642.

Texte intégral
Résumé :
Complex computer systems are often prone to anomalous or erroneous behavior, which can lead to costly downtime as the systems are diagnosed and repaired. One source of information for diagnosing the errors and anomalies are log files, which are often generated in vast and diverse amounts. However, the log files' size and semi-structured nature makes manual analysis of log files generally infeasible. Some automation is desirable to sift through the log files to find the source of the anomalies or errors. This project aimed to develop a generic algorithm that could cluster diverse log files in accordance to domain expertise. The results show that the developed algorithm performs well in accordance to manual clustering even under more relaxed data assumptions.
Komplexa datorsystem är ofta benägna att uppvisa anormalt eller felaktigt beteende, vilket kan leda till kostsamma driftstopp under tiden som systemen diagnosticeras och repareras. En informationskälla till feldiagnosticeringen är loggfiler, vilka ofta genereras i stora mängder och av olika typer. Givet loggfilernas storlek och semistrukturerade utseende så blir en manuell analys orimlig att genomföra. Viss automatisering är önsvkärd för att sovra bland loggfilerna så att källan till felen och anormaliteterna blir enklare att upptäcka. Det här projektet syftade till att utveckla en generell algoritm som kan klustra olikartade loggfiler i enlighet med domänexpertis. Resultaten visar att algoritmen presterar väl i enlighet med manuell klustring även med färre antaganden om datan.
Styles APA, Harvard, Vancouver, ISO, etc.
44

Li, Yanjun. « High Performance Text Document Clustering ». Wright State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1181005422.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
45

Strehl, Alexander. « Relationship-based clustering and cluster ensembles for high-dimensional data mining ». Thesis, Full text (PDF) from UMI/Dissertation Abstracts International, 2002. http://wwwlib.umi.com/cr/utexas/fullcit?p3088578.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
46

Konda, Swetha Reddy. « Classification of software components based on clustering ». Morgantown, W. Va. : [West Virginia University Libraries], 2007. https://eidr.wvu.edu/etd/documentdata.eTD?documentid=5510.

Texte intégral
Résumé :
Thesis (M.S.)--West Virginia University, 2007.
Title from document title page. Document formatted into pages; contains vi, 59 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 57-59).
Styles APA, Harvard, Vancouver, ISO, etc.
47

Zhang, Kai. « Kernel-based clustering and low rank approximation / ». View abstract or full-text, 2008. http://library.ust.hk/cgi/db/thesis.pl?CSED%202008%20ZHANG.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
48

Bohn, Angela, Stefan Theußl, Ingo Feinerer, Kurt Hornik, Patrick Mair et Norbert Walchhofer. « Combining Weighted Centrality and Network Clustering ». Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2009. http://epub.wu.ac.at/1466/1/document.pdf.

Texte intégral
Résumé :
In Social Network Analysis (SNA) centrality measures focus on activity (degree), information access (betweenness), distance to all the nodes (closeness), or popularity (pagerank). We introduce a new measure quantifying the distance of nodes to the network center. It is called weighted distance to nearest center (WDNC) and it is based on edge-weighted closeness (EWC), a weighted version of closeness. It combines elements of weighted centrality as well as clustering. The WDNC will be tested on two e-mail networks of the R community, one of the most important open source programs for statistical computing and graphics. We will find that there is a relationship between the WDNC and the formal organization of the R community.
Series: Research Report Series / Department of Statistics and Mathematics
Styles APA, Harvard, Vancouver, ISO, etc.
49

CAMPAGNI, RENZA. « Data Mining Models for Student Databases ». Doctoral thesis, 2013. http://hdl.handle.net/2158/803882.

Texte intégral
Résumé :
This thesis presents a data mining methodology to analyze the careers of University students, where a career is the ordered sequence of the exams taken by the single student. We present different models based on clustering, classification and sequential pattern techniques in order to understand and improve the performance of students and the scheduling of exams. We introduce an ideal career as the career of an ideal student which has taken each examination just after the end of the corresponding course, without delays. We then compare the career of a generic student with the ideal one by using these different techniques. Finally, we apply the methodology to a real case study and interpret the results which underline that the more students follow the order given by the ideal career the more they get good performance in terms of graduation time and final grade. We also analyze the career of university students from another point of view, that is, we can study the perspective of each course, by analyzing the distribution of students with respect to the delay with which they take an examination, to discover common characteristics between two or more courses. This is done in terms of Poisson distributions.
Styles APA, Harvard, Vancouver, ISO, etc.
50

VAIRA, RAFFAELE. « HUMAN BEHAVIOUR ANALYSIS IN INDOOR AND OUTDOOR ENVIRONMENTS AND CLUSTERING OF TRAVELLED TRAJECTORIES ». Doctoral thesis, 2021. http://hdl.handle.net/11566/291056.

Texte intégral
Résumé :
L'avvento, negli ultimi anni, di tecnologie meno invasive e sempre più economiche per la rilevazione delle azioni compiute da un essere umano in diversi campi, ha portato ad una enorme crescita di interesse relativamente al tema dell'analisi del comportamento umano. In particolare, attraverso la grande mole di dati ottenuti per mezzo di queste tecnologie e grazie alla maggiore potenza di calcolo disponibile, è stato possibile analizzare sempre più in dettaglio il comportamento umano sia in ambito indoor che outdoor. Il presente lavoro di tesi si colloca proprio in questo contesto. Attraverso l'utilizzo di diversi tipi di sensori per la raccolta dati, ci si è concentrati sull'analisi del comportamento dei pedoni sia all'aperto (in particolare nel contesto di un parco naturale) che al chiuso. Per quanto riguarda quest'ultima analisi, è stato preso in considerazione il contesto retail; sono state quindi sviluppate una serie di strategie per la raccolta dati in questo contesto e per l'analisi del comportamento dei consumatori su diversi livelli di dettaglio. Sempre partendo dal dato traiettoria, le abitudini dello shopper sono state analizzate a livello di negozio, di scaffale e infine di persona attraverso la sentiment analysis. L'intera analisi è stata condotta prendendo in considerazione casi di studio reali e di conseguenza dati reali.
The advent, in recent years, of less invasive and cheaper technologies for the detection of actions performed by a human in various fields, has led to a huge growth of interest in the analysis of human behavior. In particular, through the large amount of data obtained by means of these technologies and thanks to the increasing computational power available, it has been possible to analyze more and more in detail both indoor and outdoor human behavior. This thesis work is placed right in this context. Through the use of different types of sensors for data collection, we focused on the analysis of pedestrian behavior both outdoors (particularly in the context of a natural park) and in-doors. With regard to the latter analysis, the retail environment was taken into consideration; therefore, a series of strategies were developed for data collection in this context and for the analysis of consumer behavior with different levels of detail. Always starting from the trajectory data, the shopper’s habits have been analyzed at store, shelf and finally person level through the sentiment analysis. The entire analysis was conducted taking into consideration real case studies and consequently real data.
Styles APA, Harvard, Vancouver, ISO, etc.
Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!

Vers la bibliographie