Dissertations / Theses: 'Clustering based on correlation'

1

Rosén, Fredrik. "Correlation based clustering of the Stockholm Stock Exchange." Thesis, Stockholm University, School of Business, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-6500.

Full text

Abstract:

This thesis present a topological classification of stocks traded on the Stockholm Stock Exchange based solely on the co-movements between individual stocks. The working hypothesis is that an ultrametric space is an appropriate space for linking stocks together. The hierarchical structure is obtained from the matrix of correlation coefficient computed between all pairs of stocks included in the OMXS~30 portfolio by considering the daily logarithmic return. The dynamics of the system is investigated by studying the distribution and time dependence of the correlation coefficients. Average linkage clustering is proposed as an alternative to the conventional single linkage clustering. The empirical investigation show that the Minimum-Spanning Tree (the graphical representation of the clustering procedure) describe the reciprocal arrangement of the stocks included in the investigated portfolio in a way that also makes sense from an economical point of view. Average linkage clustering results in five main clusters, consisting of Machinery, Bank, Telecom, Paper & Forest and Security companies. Most groups are homogeneous with respect to their sector and also often with respect to their sub-industry, as specified by the GICS classification standard. E.g. the Bank cluster consists of the Commercial Bank companies FöreningsSparbanken, SEB, Handelsbanken and Nordea. However, there are also examples where companies form cluster without belonging to the same sector. One example of this is the Security cluster, consisting of ASSA (Building Products) and Securitas (Diversified Commercial \& Professional Services). Even if they belong to different industries, both are active in the security area. ASSA is a manufacturer and supplier of locking solutions and SECU focus on guarding solutions, security systems and cash handling. The empirical results show that it is possible to obtain a meaningful taxonomy based solely on the co-movements between individual stocks and the fundamental ultrametric assumption, without any presumptions of the companies business activity. The obtained clusters indicate that common economical factors can affect certain groups of stocks, irrespective of their GICS industry classification. The outcome of the investigation is of fundamental importance for e.g. asset classification and portfolio optimization, where the co-movement between assets is of vital importance.

APA, Harvard, Vancouver, ISO, and other styles

2

Pettersson, Christoffer. "Investigating the Correlation Between Marketing Emails and Receivers Using Unsupervised Machine Learning on Limited Data : A comprehensive study using state of the art methods for text clustering and natural language processing." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189147.

Full text

Abstract:

The goal of this project is to investigate any correlation between marketing emails and their receivers using machine learning and only a limited amount of initial data. The data consists of roughly 1200 emails and 98.000 receivers of these. Initially, the emails are grouped together based on their content using text clustering. They contain no information regarding prior labeling or categorization which creates a need for an unsupervised learning approach using solely the raw text based content as data. The project investigates state-of-the-art concepts like bag-of-words for calculating term importance and the gap statistic for determining an optimal number of clusters. The data is vectorized using term frequency - inverse document frequency to determine the importance of terms relative to the document and to all documents combined. An inherit problem of this approach is high dimensionality which is reduced using latent semantic analysis in conjunction with singular value decomposition. Once the resulting clusters have been obtained, the most frequently occurring terms for each cluster are analyzed and compared. Due to the absence of initial labeling an alternative approach is required to evaluate the clusters validity. To do this, the receivers of all emails in each cluster who actively opened an email is collected and investigated. Each receiver have different attributes regarding their purpose of using the service and some personal information. Once gathered and analyzed, conclusions could be drawn that it is possible to find distinguishable connections between the resulting email clusters and their receivers but to a limited extent. The receivers from the same cluster did show similar attributes as each other which were distinguishable from the receivers of other clusters. Hence, the resulting email clusters and their receivers are specific enough to distinguish themselves from each other but too general to handle more detailed information. With more data, this could become a useful tool for determining which users of a service should receive a particular email to increase the conversion rate and thereby reach out to more relevant people based on previous trends.
Målet med detta projekt att undersöka eventuella samband mellan marknadsföringsemail och dess mottagare med hjälp av oövervakad maskininlärning på en brgränsad mängd data. Datan består av ca 1200 email meddelanden med 98.000 mottagare. Initialt så gruperas alla meddelanden baserat på innehåll via text klustering. Meddelandena innehåller ingen information angående tidigare gruppering eller kategorisering vilket skapar ett behov för ett oövervakat tillvägagångssätt för inlärning där enbart det råa textbaserade meddelandet används som indata. Projektet undersöker moderna tekniker så som bag-of-words för att avgöra termers relevans och the gap statistic för att finna ett optimalt antal kluster. Datan vektoriseras med hjälp av term frequency - inverse document frequency för att avgöra relevansen av termer relativt dokumentet samt alla dokument kombinerat. Ett fundamentalt problem som uppstår via detta tillvägagångssätt är hög dimensionalitet, vilket reduceras med latent semantic analysis tillsammans med singular value decomposition. Då alla kluster har erhållits så analyseras de mest förekommande termerna i vardera kluster och jämförs. Eftersom en initial kategorisering av meddelandena saknas så krävs ett alternativt tillvägagångssätt för evaluering av klustrens validitet. För att göra detta så hämtas och analyseras alla mottagare för vardera kluster som öppnat något av dess meddelanden. Mottagarna har olika attribut angående deras syfte med att använda produkten samt personlig information. När de har hämtats och undersökts kan slutsatser dras kring hurvida samband kan hittas. Det finns ett klart samband mellan vardera kluster och dess mottagare, men till viss utsträckning. Mottagarna från samma kluster visade likartade attribut som var urskiljbara gentemot mottagare från andra kluster. Därav kan det sägas att de resulterande klustren samt dess mottagare är specifika nog att urskilja sig från varandra men för generella för att kunna handera mer detaljerad information. Med mer data kan detta bli ett användbart verktyg för att bestämma mottagare av specifika emailutskick för att på sikt kunna öka öppningsfrekvensen och därmed nå ut till mer relevanta mottagare baserat på tidigare resultat.

APA, Harvard, Vancouver, ISO, and other styles

3

Zimek, Arthur. "Correlation Clustering." Diss., lmu, 2008. http://nbn-resolving.de/urn:nbn:de:bvb:19-87361.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

To, Thang Long Information Technology &amp Electrical Engineering Australian Defence Force Academy UNSW. "Video object segmentation using phase-base detection of moving object boundaries." Awarded by:University of New South Wales - Australian Defence Force Academy. School of Information Technology and Electrical Engineering, 2005. http://handle.unsw.edu.au/1959.4/38705.

Full text

Abstract:

A video sequence often contains a number of objects. For each object, the motion of its projection on the video frames is affected by its movement in 3-D space, as well as the movement of the camera. Video object segmentation refers to the task of delineating and distinguishing different objects that exist in a series of video frames. Segmentation of moving objects from a two-dimensional video is difficult due to the lack of depth information at the boundaries between different objects. As the motion incoherency of a region is intrinsically linked to the presence of such boundaries and vice versa, a failure to recognise a discontinuity in the motion field, or the use of an incorrect motion, often leads directly to errors in the segmentation result. In addition, many defects in a segmentation mask are also located in the vicinity of moving object boundaries, due to the unreliability of motion estimation in these regions. The approach to segmentation in this work comprises of three stages. In the first part, a phase-based method is devised for detection of moving object boundaries. This detection scheme is based on the characteristics of a phase-matched difference image, and is shown to be sensitive to even small disruptions to a coherent motion field. In the second part, a spatio-temporal approach for object segmentation is introduced, which involves a spatial segmentation in the detected boundary region, followed by a motion-based region-merging operation using three temporally adjacent video frames. In the third stage, a multiple-frame approach for stabilisation of object masks is introduced to alleviate the defects which may have existed earlier in a local segmentation, and to improve upon the temporal consistency of object boundaries in the segmentation masks along a sequence. The feasibility of the proposed work is demonstrated at each stage through examples carried out on a number of real video sequences. In the presence of another object motion, the phase-based boundary detection method is shown to be much more sensitive than direct measures such as sum-of-squared error on a motion-compensated difference image. The three-frame segmentation scheme also compares favourably with a recently proposed method initiated from a non-selective spatial segmentation. In addition, improvements in the quality of the object masks after the stabilisation stage are also observed both quantitatively and visually. The final segmentation result is then used in an experimental object-based video compression framework, which also shows improvements in efficiency over a contemporary video coding method.

APA, Harvard, Vancouver, ISO, and other styles

5

Ren, Jinchang. "Semantic content analysis for effective video segmentation, summarisation and retrieval." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4251.

Full text

Abstract:

This thesis focuses on four main research themes namely shot boundary detection, fast frame alignment, activity-driven video summarisation, and highlights based video annotation and retrieval. A number of novel algorithms have been proposed to address these issues, which can be highlighted as follows. Firstly, accurate and robust shot boundary detection is achieved through modelling of cuts into sub-categories and appearance based modelling of several gradual transitions, along with some novel features extracted from compressed video. Secondly, fast and robust frame alignment is achieved via the proposed subspace phase correlation (SPC) and an improved sub-pixel strategy. The SPC is proved to be insensitive to zero-mean-noise, and its gradient-based extension is even robust to non-zero-mean noise and can be used to deal with non-overlapped regions for robust image registration. Thirdly, hierarchical modelling of rush videos using formal language techniques is proposed, which can guide the modelling and removal of several kinds of junk frames as well as adaptive clustering of retakes. With an extracted activity level measurement, shot and sub-shot are detected for content-adaptive video summarisation. Fourthly, highlights based video annotation and retrieval is achieved, in which statistical modelling of skin pixel colours, knowledge-based shot detection, and improved determination of camera motion patterns are employed. Within these proposed techniques, one important principle is to integrate various kinds of feature evidence and to incorporate prior knowledge in modelling the given problems. High-level hierarchical representation is extracted from the original linear structure for effective management and content-based retrieval of video data. As most of the work is implemented in the compressed domain, one additional benefit is the achieved high efficiency, which will be useful for many online applications.

APA, Harvard, Vancouver, ISO, and other styles

6

Batet, Sanromà Montserrat. "Ontology based semantic clustering." Doctoral thesis, Universitat Rovira i Virgili, 2011. http://hdl.handle.net/10803/31913.

Full text

Abstract:

Els algoritmes de clustering desenvolupats fins al moment s’han centrat en el processat de dades numèriques i categòriques, no considerant dades textuals. Per manegar adequadament aquestes dades, es necessari interpretar el seu significat a nivell semàntic. En aquest treball es presenta un nou mètode de clustering que es capaç d’interpretar, de forma integrada, dades numèriques, categòriques i textuals. Aquest últims es processaran mitjançant mesures de similitud semàntica basades en 1) la utilització del coneixement taxonòmic contingut en una o diferents ontologies i 2) l’estimació de la distribució de la informació dels termes a la Web. Els resultats mostren que una interpretació precisa de la informació textual a nivell semàntic millora els resultats del clustering i facilita la interpretació de les classificacions.
Clustering algorithms have focused on the management of numerical and categorical data. However, in the last years, textual information has grown in importance. Proper processing of this kind of information within data mining methods requires an interpretation of their meaning at a semantic level. In this work, a clustering method aimed to interpret, in an integrated manner, numerical, categorical and textual data is presented. Textual data will be interpreted by means of semantic similarity measures. These measures calculate the alikeness between words by exploiting one or several knowledge sources. In this work we also propose two new ways of compute semantic similarity based on 1) the exploitation of the taxonomical knowledge available on one or several ontologies and 2) the estimation of the information distribution of terms in the Web. Results show that a proper interpretation of textual data at a semantic level improves clustering results and eases the interpretability of the classifications

APA, Harvard, Vancouver, ISO, and other styles

7

Luo, Yongfeng. "Range-Based Graph Clustering." University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014606422.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Fuentes, Garcia Ruth S. "Bayesian model-based clustering." Thesis, University of Bath, 2004. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.412350.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Albarakati, Rayan. "Density Based Data Clustering." CSUSB ScholarWorks, 2015. https://scholarworks.lib.csusb.edu/etd/134.

Full text

Abstract:

Data clustering is a data analysis technique that groups data based on a measure of similarity. When data is well clustered the similarities between the objects in the same group are high, while the similarities between objects in different groups are low. The data clustering technique is widely applied in a variety of areas such as bioinformatics, image segmentation and market research. This project conducted an in-depth study on data clustering with focus on density-based clustering methods. The latest density-based (CFSFDP) algorithm is based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively larger distance from points with higher densities. This method has been examined, experimented, and improved. These methods (KNN-based, Gaussian Kernel-based and Iterative Gaussian Kernel-based) are applied in this project to improve (CFSFDP) density-based clustering. The methods are applied to four milestone datasets and the results are analyzed and compared.

APA, Harvard, Vancouver, ISO, and other styles

10

Faria, Rodrigo Augusto Dias. "Human skin segmentation using correlation rules on dynamic color clustering." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-01102018-101814/.

Full text

Abstract:

Human skin is made of a stack of different layers, each of which reflects a portion of impinging light, after absorbing a certain amount of it by the pigments which lie in the layer. The main pigments responsible for skin color origins are melanin and hemoglobin. Skin segmentation plays an important role in a wide range of image processing and computer vision applications. In short, there are three major approaches for skin segmentation: rule-based, machine learning and hybrid. They differ in terms of accuracy and computational efficiency. Generally, machine learning and hybrid approaches outperform the rule-based methods but require a large and representative training dataset and, sometimes, costly classification time as well, which can be a deal breaker for real-time applications. In this work, we propose an improvement, in three distinct versions, of a novel method for rule-based skin segmentation that works in the YCbCr color space. Our motivation is based on the hypotheses that: (1) the original rule can be complemented and, (2) human skin pixels do not appear isolated, i.e. neighborhood operations are taken into consideration. The method is a combination of some correlation rules based on these hypotheses. Such rules evaluate the combinations of chrominance Cb, Cr values to identify the skin pixels depending on the shape and size of dynamically generated skin color clusters. The method is very efficient in terms of computational effort as well as robust in very complex images.
A pele humana é constituída de uma série de camadas distintas, cada uma das quais reflete uma porção de luz incidente, depois de absorver uma certa quantidade dela pelos pigmentos que se encontram na camada. Os principais pigmentos responsáveis pela origem da cor da pele são a melanina e a hemoglobina. A segmentação de pele desempenha um papel importante em uma ampla gama de aplicações em processamento de imagens e visão computacional. Em suma, existem três abordagens principais para segmentação de pele: baseadas em regras, aprendizado de máquina e híbridos. Elas diferem em termos de precisão e eficiência computacional. Geralmente, as abordagens com aprendizado de máquina e as híbridas superam os métodos baseados em regras, mas exigem um conjunto de dados de treinamento grande e representativo e, por vezes, também um tempo de classificação custoso, que pode ser um fator decisivo para aplicações em tempo real. Neste trabalho, propomos uma melhoria, em três versões distintas, de um novo método de segmentação de pele baseado em regras que funciona no espaço de cores YCbCr. Nossa motivação baseia-se nas hipóteses de que: (1) a regra original pode ser complementada e, (2) pixels de pele humana não aparecem isolados, ou seja, as operações de vizinhança são levadas em consideração. O método é uma combinação de algumas regras de correlação baseadas nessas hipóteses. Essas regras avaliam as combinações de valores de crominância Cb, Cr para identificar os pixels de pele, dependendo da forma e tamanho dos agrupamentos de cores de pele gerados dinamicamente. O método é muito eficiente em termos de esforço computacional, bem como robusto em imagens muito complexas.

APA, Harvard, Vancouver, ISO, and other styles

11

Xu, Tianbing. "Nonparametric evolutionary clustering." Diss., Online access via UMI:, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

12

Jarjour, Riad. "Clustering financial time series for volatility modeling." Diss., University of Iowa, 2018. https://ir.uiowa.edu/etd/6439.

Full text

Abstract:

The dynamic conditional correlation (DCC) model and its variants have been widely used in modeling the volatility of multivariate time series, with applications in portfolio construction and risk management. While popular for its simplicity, the DCC uses only two parameters to model the correlation dynamics, regardless of the number of assets. The flexible dynamic conditional correlation (FDCC) model attempts to remedy this by grouping the stocks into various clusters, each with its own set of parameters. However, it assumes the grouping is known apriori. In this thesis we develop a systematic method to determine the number of groups to use as well as how to allocate the assets to groups. We show through simulation that the method does well in identifying the groups, and apply the method to real data, showing its performance. We also develop and apply a Bayesian approach to this same problem. Furthermore, we propose an instantaneous measure of correlation that can be used in many volatility models, and in fact show that it outperforms the popular sample Pearson's correlation coefficient for small sample sizes, thus opening the door to applications in fields other than finance.

APA, Harvard, Vancouver, ISO, and other styles

13

Erdem, Cosku. "Density Based Clustering Using Mathematical Morphology." Master's thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/12608264/index.pdf.

Full text

Abstract:

Improvements in technology, enables us to store large amounts of data in warehouses. In parallel, the need for processing this vast amount of raw data and translating it into interpretable information also increases. A commonly used solution method for the described problem in data mining is clustering. We propose "
Density Based Clustering Using Mathematical Morphology"
(DBCM) algorithm as an effective clustering method for extracting arbitrary shaped clusters of noisy numerical data in a reasonable time. This algorithm is predicated on the analogy between images and data warehouses. It applies grayscale morphology which is an image processing technique on multidimensional data. In this study we evaluated the performance of the proposed algorithm on both synthetic and real data and observed that the algorithm produces successful and interpretable results with appropriate parameters. In addition, we computed the computational complexity to be linear on number of data points for low dimensional data and exponential on number of dimensions for high dimensional data mainly due to the morphology operations.

APA, Harvard, Vancouver, ISO, and other styles

14

Malsiner-Walli, Gertraud, Daniela Pauger, and Helga Wagner. "Effect fusion using model-based clustering." Sage, 2018. http://dx.doi.org/10.1177/1471082X17739058.

Full text

Abstract:

In social and economic studies many of the collected variables are measured on a nominal scale, often with a large number of categories. The definition of categories can be ambiguous and different classification schemes using either a finer or a coarser grid are possible. Categorization has an impact when such a variable is included as covariate in a regression model: a too fine grid will result in imprecise estimates of the corresponding effects, whereas with a too coarse grid important effects will be missed, resulting in biased effect estimates and poor predictive performance. To achieve an automatic grouping of the levels of a categorical covariate with essentially the same effect, we adopt a Bayesian approach and specify the prior on the level effects as a location mixture of spiky Normal components. Model-based clustering of the effects during MCMC sampling allows to simultaneously detect categories which have essentially the same effect size and identify variables with no effect at all. Fusion of level effects is induced by a prior on the mixture weights which encourages empty components. The properties of this approach are investigated in simulation studies. Finally, the method is applied to analyse effects of high-dimensional categorical predictors on income in Austria.

APA, Harvard, Vancouver, ISO, and other styles

15

Rand, McFadden Renata. "Aspect Mining Using Model-Based Clustering." NSUWorks, 2011. http://nsuworks.nova.edu/gscis_etd/281.

Full text

Abstract:

Legacy systems contain critical and complex business code that has been in use for a long time. This code is difficult to understand, maintain, and evolve, in large part due to crosscutting concerns: software system features, such as persistence, logging, and error handling, whose implementation is spread across multiple modules. Aspect-oriented techniques separate crosscutting concerns from the base code, using separate modules called aspects and, thus, simplifying the legacy code. Aspect mining techniques identify aspect candidates so that the legacy code can be refactored into aspects. This study investigated an automated aspect mining method in which a vector-space model clustering approach was used with model-based clustering. The vector-space model clustering approach has been researched for aspect mining using a number of different heuristic clustering methods and producing mixed results. Prior to this study, this model had not been researched with model-based algorithms, even though they have grown in popularity because they lend themselves to statistical analysis and show results that are as good as or better than heuristic clustering methods. This study investigated the effectiveness of model-based clustering for identifying aspects when compared against heuristic methods, such as k-means clustering and agglomerative hierarchical clustering, using six different vector-space models. The study's results indicated that model-based clustering can, in fact, be more effective than heuristic methods and showed good promise for aspect mining. In general, model-based algorithms performed better in not spreading the methods of the concerns across the multiple clusters but did not perform as well in not mixing multiple concerns in the same cluster. Model-based algorithms were also significantly better at partitioning the data such that, given an ordered list of clusters, fewer clusters and methods would need to be analyzed to find all the concerns. In addition, model-based algorithms automatically determined the optimal number of clusters, which was a great advantage over heuristic-based algorithms. Lastly, the study found that the new vector-space models performed better, relative to aspect mining, than previously defined vector-space models.

APA, Harvard, Vancouver, ISO, and other styles

16

Dsouza, Jeevan. "Region-based Crossover for Clustering Problems." NSUWorks, 2012. http://nsuworks.nova.edu/gscis_etd/139.

Full text

Abstract:

Data clustering, which partitions data points into clusters, has many useful applications in economics, science and engineering. Data clustering algorithms can be partitional or hierarchical. The k-means algorithm is the most widely used partitional clustering algorithm because of its simplicity and efficiency. One problem with the k-means algorithm is that the quality of partitions produced is highly dependent on the initial selection of centers. This problem has been tackled using genetic algorithms (GA) where a set of centers is encoded into an individual of a population and solutions are generated using evolutionary operators such as crossover, mutation and selection. Of the many GA methods, the region-based genetic algorithm (RBGA) has proven to be an effective technique when the centroid was used as the representative object of a cluster (ROC) and the Euclidean distance was used as the distance metric. The RBGA uses a region-based crossover operator that exchanges subsets of centers that belong to a region of space rather than exchanging random centers. The rationale is that subsets of centers that occupy a given region of space tend to serve as building blocks. Exchanging such centers preserves and propagates high-quality partial solutions. This research aims at assessing the RBGA with a variety of ROCs and distance metrics. The RBGA was tested along with other GA methods, on four benchmark datasets using four distance metrics, varied number of centers, and centroids and medoids as ROCs. The results obtained showed the superior performance of the RBGA across all datasets and sets of parameters, indicating that region-based crossover may prove an effective strategy across a broad range of clustering problems.

APA, Harvard, Vancouver, ISO, and other styles

17

Wei, Wutao. "Model Based Clustering Algorithms with Applications." Thesis, Purdue University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10830711.

Full text

Abstract:

In machine learning predictive area, unsupervised learning will be applied when the labels of the data are unavailable, laborious to obtain or with limited proportion. Based on the special properties of data, we can build models by understanding the properties and making some reasonable assumptions. In this thesis, we will introduce three practical problems and discuss them in detail. This thesis produces 3 papers as follow: Wei, Wutao, et al. "A Non-parametric Hidden Markov Clustering Model with Applications to Time Varying User Activity Analysis." ICMLA2015 Wei, Wutao, et al. "Dynamic Bayesian predictive model for box office forecasting." IEEE Big Data 2017. Wei, Wutao, Bowei Xi, and Murat Kantarcioglu. "Adversarial Clustering: A Grid Based Clustering Algorithm Against Active Adversaries." Submitted

User Profiling Clustering: Activity data of individual users on social media are easily accessible in this big data era. However, proper modeling strategies for user profiles have not been well developed in the literature. Existing methods or models usually have two limitations. The first limitation is that most methods target the population rather than individual users, and the second is that they cannot model non-stationary time-varying patterns. Different users in general demonstrate different activity modes on social media. Therefore, one population model may fail to characterize activities of individual users. Furthermore, online social media are dynamic and ever evolving, so are users’ activities. Dynamic models are needed to properly model users’ activities. In this paper, we introduce a non-parametric hidden Markov model to characterize the time-varying activities of social media users. In addition, based on the proposed model, we develop a clustering method to group users with similar activity patterns.

Adversarial Clustering: Nowadays more and more data are gathered for detecting and preventing cyber-attacks. Unique to the cyber security applications, data analytics techniques have to deal with active adversaries that try to deceive the data analytics models and avoid being detected. The existence of such adversarial behavior motivates the development of robust and resilient adversarial learning techniques for various tasks. In the past most of the work focused on adversarial classification techniques, which assumed the existence of a reasonably large amount of carefully labeled data instances. However, in real practice, labeling the data instances often requires costly and time-consuming human expertise and becomes a significant bottleneck. Meanwhile, a large number of unlabeled instances can also be used to understand the adversaries' behavior. To address the above mentioned challenges, we develop a novel grid based adversarial clustering algorithm. Our adversarial clustering algorithm is able to identify the core normal regions, and to draw defensive walls around the core positions of the normal objects utilizing game theoretic ideas. Our algorithm also identifies sub-clusters of attack objects, the overlapping areas within clusters, and outliers which may be potential anomalies.

Dynamic Bayesian Update for Profiling Clustering: Movie industry becomes one of the most important consumer business. The business is also more and more competitive. As a movie producer, there is a big cost in movie production and marketing; as an owner of a movie theater, it is also a problem that how to arrange the limited screens to the current movies in theater. However, all the current models in movie industry can only give an estimate of the opening week. We improve the dynamic linear model with a Bayesian framework. By using this updating method, we are also able to update the streaming adversarial data and make defensive recommendation for the defensive systems.

APA, Harvard, Vancouver, ISO, and other styles

18

Chan, Alton Kam Fai. "Hyperplane based efficient clustering and searching /." View abstract or full-text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20CHANA.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Malsiner-Walli, Gertraud, Sylvia Frühwirth-Schnatter, and Bettina Grün. "Model-based clustering based on sparse finite Gaussian mixtures." Springer, 2016. http://dx.doi.org/10.1007/s11222-014-9500-2.

Full text

Abstract:

In the framework of Bayesian model-based clustering based on a finite mixture of Gaussian distributions, we present a joint approach to estimate the number of mixture components and identify cluster-relevant variables simultaneously as well as to obtain an identified model. Our approach consists in specifying sparse hierarchical priors on the mixture weights and component means. In a deliberately overfitting mixture model the sparse prior on the weights empties superfluous components during MCMC. A straightforward estimator for the true number of components is given by the most frequent number of non-empty components visited during MCMC sampling. Specifying a shrinkage prior, namely the normal gamma prior, on the component means leads to improved parameter estimates as well as identification of cluster-relevant variables. After estimating the mixture model using MCMC methods based on data augmentation and Gibbs sampling, an identified model is obtained by relabeling the MCMC output in the point process representation of the draws. This is performed using K-centroids cluster analysis based on the Mahalanobis distance. We evaluate our proposed strategy in a simulation setup with artificial data and by applying it to benchmark data sets. (authors' abstract)

APA, Harvard, Vancouver, ISO, and other styles

20

Braune, Christian [Verfasser]. "Skeleton-based validation for density-based clustering / Christian Braune." Magdeburg : Universitätsbibliothek Otto-von-Guericke-Universität, 2018. http://d-nb.info/1220035653/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Durkalec, Anna. "Properties and evolution of galaxy clustering at 2." Thesis, Aix-Marseille, 2014. http://www.theses.fr/2014AIXM4758/document.

Full text

Abstract:

Cette thèse porte sur l'étude des propriétés et l'évolution de regroupement de galaxies pour les galaxies de la gamme de 22. Je ai pu mesurer la distribution spatiale d'une population générale de galaxie à redshift z~3 pour la première fois avec une grande précision. Je ai quantifié le regroupement de galaxie en estimation et la modélisation de la fonction de corrélation projetée (espace réel) à deux points, pour une population générale de 3022 galaxies. Je ai prolongé les mesures de regroupement à la luminosité et des sous-échantillons de masse sélectionné stellaires. Mes résultats montrent que la force de regroupement de la population générale de la galaxie ne change pas de redshift z~3,5 à z~2,5, mais dans les deux redshift va plus lumineux et des galaxies plus massives sont plus regroupées que les moins lumineux (massives). En utilisant la distribution d'occupation de halo (HOD) formalisme je mesuré une masse moyenne de halo hôte au redshift z~3 significativement plus faible que les masses halo moyens observés à faible redshift. Je ai conclu que la population de formation d'étoiles observé des galaxies à z~3 aurait évolué dans le massif et lumineux la population de galaxies au z=0. Aussi, je interpréter les mesures de regroupement en termes de biais de galaxies à grande échelle linéaire. Je trouve que ce est nettement plus élevé que le biais des galaxies redshift intermédiaire et faible. Enfin, je ai calculé le ratio-stellaire Halo masse (SHMR) et l'efficacité intégrée de formation d'étoiles (ISFE) pour étudier l'efficacité de la formation des étoiles et l'assemblage masse stellaire
This thesis focuses on the study of the properties and evolution of galaxy clustering for galaxies in the redshift range 22. I was able to measure the spatial distribution of a general galaxy population at redshift z~3 for the first time with a high accuracy. I quantified the galaxy clustering by estimating and modelling the projected (real-space) two-point correlation function, for a general population of 3022 galaxies. I extended the clustering measurements to the luminosity and stellar mass-selected sub-samples. My results show that the clustering strength of the general galaxy population does not change significantly from redshift z~3.5 to z~2.5, but in both redshift ranges more luminous and more massive galaxies are more clustered than less luminous (massive) ones. Using the halo occupation distribution (HOD) formalism I measured an average host halo mass at redshift z~3 significantly lower than the observed average halo masses at low redshift. I concluded that the observed star-forming population of galaxies at z~3 might have evolved into the massive and bright (Mr<-21.5) galaxy population at redshift z=0. Also, I interpret clustering measurements in terms of a linear large-scale galaxy bias. I find it to be significantly higher than the bias of intermediate and low redshift galaxies. Finally, I computed the stellar-to-halo mass ratio (SHMR) and the integrated star formation efficiency (ISFE) to study the efficiency of star formation and stellar mass assembly. I find that the integrated star formation efficiency is quite high at ~16% for the average galaxies at z~3

APA, Harvard, Vancouver, ISO, and other styles

22

Wahid, Dewan Ferdous. "Random models and heuristic algorithms for correlation clustering problems on signed social networks." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/61438.

Full text

Abstract:

In social sciences, the signed directed networks are used to represent the mutual friendship and foe attitudes among the members of a social group. Recent studies show that different real-world properties (e.g. preferential attachment, copying etc.) can be observed in the web-based social networks. In this thesis, we study the positive/negative - in/out - degree distributions in three online signed directed social networks. We observe that all signed-directed degree distributions in the web-based social networks with multiple edges possibilities (in both directions) follow a power law with exponents in the range 2.0<= \gamma <= 3.5. We present three random models, which capture the preferential attachment and copying properties, for web-based signed directed social networks. The signed-directed degree distributions in the networks simulated by the proposed random models also indicate a power-law trait with an exponent in the range 2.0<= \gamma <= 3.5. We also present a heuristic algorithm for the Correlation Clustering (CC) which is a class of community detection problem in the signed network. The CC problem can be defined as follow: for a given signed network, finding an optimal partition in the vertices such that the edges inside a group are positives and the edges between two groups are negative. We present the algorithm based on the relaxing integer linear programming formulation of the minimum disagreement CC problem and rounding the approximate ultrametric distance matrix by using a given threshold. The experimental results show that, in the random signed G(n,e,p) network, the runtime of this algorithm is nearly independent for the cases e>= 0.4 and p<=0.6 , where e and p are the probabilities of connecting two vertices by an edge and an edge to be positive respectively. But this algorithm does not give any convincing argument in the variation of the minimum disagreements due to the changing of the given threshold. We also apply this algorithm to the International National Bilateral Tread Growth Network derived from the bilateral trading data in 2011-2015 from the International Trade Center (ITC) to identify the groups of countries with average positive trade growth.
Irving K. Barber School of Arts and Sciences (Okanagan)
Computer Science, Department of (Okanagan)
Graduate

APA, Harvard, Vancouver, ISO, and other styles

23

Mata, Raman Deep. "Correlation based landmine detection technique /." free to MU campus, to others for purchase, 2004. http://wwwlib.umi.com/cr/mo/fullcit?p1426084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Zhou, Dunke. "High-dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1338303646.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Holzapfel, Klaus. "Density-based clustering in large-scale networks." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=979979943.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Slaaen, Roger Antoniussen. "Clustering based localization for wireless sensor networks." Online access for everyone, 2006. http://www.dissertations.wsu.edu/Thesis/Spring2006/R%5FSlaaen%5F050406.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Shekar, B. "A Knowledge-Based Approach To Pattern Clustering." Thesis, Indian Institute of Science, 1988. http://hdl.handle.net/2005/86.

Full text

Abstract:

The primary objective of this thesis is to develop a methodology for clustering of objects based on their functionality typified by the notion of concept. We begin by giving a formal definition of concept. By assigning a functional interpretation to the underlying concept, we demonstrate the applicability of the functionally interpreted concept for clustering objects. This functional interpretation leads us to identifying two classes of concepts, namely, the Necessary class and the Quality-Improvement class. Next, we categorize the functional cohesiveness among objects into three different classes. Further, we axiomatize the restrictions imposed, on the execution of functions of objects, by the non-availability of sufficient resources. To facilitate describing functional clusters in a succinct manner, we define connectives that capture the imposed restrictions. Also we justify the adequacy of these connectives for describing functional clusters. We then propose a suitable data structure to represent the functionally interpreted concept, and develop an algorithm to perform this axiomatic functional partitioning of objects. We illustrate the functional partitioning of objects through a real-world example. We formally establish the invariance of the resulting cluster descriptions, with respect to the order in which the given set of objects is examined. This invariance would facilitate parallel implementations of the proposed methodology. We then analyze different functional cluster configurations from a structural viewpoint. In doing so, we identify the presence of a specific property among certain cluster configurations. We also state a sufficient condition for the presence of this property in any cluster. A separate class of concepts, namely the Concept Transformer class, displaying certain properties, is identified and studied in detail. We also demonstrate its applicability to functional clustering. Finally, we examine a knowledge-based pattern synthesis problem from a functional angle as a significant application of the functional interpretation of concept and associated data structures. Here, we show that a concept, from the functional view-point, can be viewed as the synthesis of various other concepts; the synthesis is an outcome of a knowledge-based goal-directed pattern-matching activity. The proposed methodology has the potential to cluster objects that imply functions by virtue of their physical properties.

APA, Harvard, Vancouver, ISO, and other styles

28

Sucasas, V. "Environmental-based smart clustering for mobile networks." Thesis, University of Surrey, 2016. http://epubs.surrey.ac.uk/811628/.

Full text

Abstract:

Nowadays there is a plethora of wireless handsets in the market such as smartphones, tablets, laptops and wearable devices, that together with the the future emerging scenarios on vehicle to vehicle communications and smart city infrastructure will populate urban environments with a broad diversity of multi-standard wireless devices. This increase in the density and diversity of mobile devices have been the driver for collaborative protocols that can deliver eﬀective communications. Cooperation is a technology that has the potential to provide energy eﬃcient and scalable communications, where nodes play an important role to coordinate local traﬃc and act as gateways to the core network. Despite the diverse power requirements of multi-standard wireless interfaces and the diﬀerent channel characteristics, support for energy eﬃcient communications where relay nodes can be selected with lower energy requirements or with higher order modulation opportunities, is still expected. In this framework, clustering is a widely accepted technique that allows nodes create and join virtual cooperative groups, and to select a clusterhead that can provide a high speed and energy eﬃcient backhaul link to the mobile network. The vast majority of existing clustering techniques assume that the collection of nodes that form a cluster are either static or have very low relative velocity. However, in practice nodes or devices are constantly on the move providing the impetus for mobility aware clustering techniques that elect a subset of nodes with a common mobility pattern. In this context, mobility-aware clustering, based on geolocation, is an active ﬁeld of research due to the increasing interest of vehicular communication technology. However, clustering has a wide range of applications where GPS information is not always available. This requires a new design of clustering algorithms that do not depend on GPS coordinates. This challenge has fostered a new vision of clustering based on cognition, where nodes form mobile clusters that can adapt on-demand to the scenario characteristics. This thesis investigates cluster formation exploiting the notion of wisdom of crowds, where the nodes are aware of the surrounding mobility patterns and can adapt the cluster formation strategy to suit the current mobility trends. Moreover, this thesis also caters for a novel analytical model for cluster lifetime that is used to validate our simulation results. Another dimension to the clustering problem is how to exploit available spectral opportunities for cluster formation in a secure manner. Cognitive radio, and more concretely cooperative spectrum sensing is evaluated in this thesis as a solution for data channel assignment in mobile clusters. In this scenario, we focus on the security concerns of cooperative spectrum sensing. Namely, we address spectrum sensing data falsiﬁcation and incumbent emulation attacks, and propose an energy eﬃcient security mechanisms based on lightweight cryptography to address these threats.

APA, Harvard, Vancouver, ISO, and other styles

29

Frühwirth, Rudolf, Korbinian Eckstein, and Sylvia Frühwirth-Schnatter. "Vertex finding by sparse model-based clustering." IOP Publishing, 2016. http://epub.wu.ac.at/6173/1/jop.pdf.

Full text

Abstract:

The application of sparse model-based clustering to the problem of primary vertex finding is discussed. The observed z-positions of the charged primary tracks in a bunch crossing are modeled by a Gaussian mixture. The mixture parameters are estimated via Markov Chain Monte Carlo (MCMC). Sparsity is achieved by an appropriate prior on the mixture weights. The results are shown and compared to clustering by the expectation-maximization (EM) algorithm.

APA, Harvard, Vancouver, ISO, and other styles

30

Liu, Jun. "Model-based clustering algorithms, performance and application." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0030/NQ66280.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Konda, Swetha Reddy. "Classification of software components based on clustering." Morgantown, W. Va. : [West Virginia University Libraries], 2007. https://eidr.wvu.edu/etd/documentdata.eTD?documentid=5510.

Full text

Abstract:

Thesis (M.S.)--West Virginia University, 2007.
Title from document title page. Document formatted into pages; contains vi, 59 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 57-59).

APA, Harvard, Vancouver, ISO, and other styles

32

Ning, Hoi-Kwan Flora. "Model-based regression clustering with variable selection." Thesis, University of Oxford, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.497059.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

FIGUEIREDO, AURELIO MORAES. "MAPPING SEISMIC EVENTS USING CLUSTERING-BASED METHODOLOGIES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2015. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=26709@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Neste trabalho apresentamos metodologias baseadas em algoritmos de agrupamento de dados utilizadas para processamento de dados sísmicos 3D. Nesse processamento, os voxels de entrada do volume são substituídos por vetores de características que representam a vizinhança local do voxel dentro do seu traço sísmico. Esses vetores são processados por algoritmos de agrupamento de dados. O conjunto de grupos resultantes é então utilizado para gerar uma nova representação do volume sísmico de entrada. Essa estratégia permite modelar a estrutura global do sinal sísmico ao longo de sua vizinhança lateral, reduzindo significativamente o impacto de ruído e demais anomalias presentes no dado original. Os dados pós-processados são então utilizados com duas finalidades principais: o mapeamento automático de horizontes ao longo do volume, e a produção de volumes de visualização destinados a enfatizar possíveis descontinuidades presentes no dado sísmico de entrada, particularmente falhas geológicas. Com relação ao mapeamento de horizontes, o fato de as amostras de entrada dos processos de agrupamento não conterem informação de sua localização 3D no volume permite uma classificação não enviesada dos voxels nos grupos. Consequentemente a metodologia apresenta desempenho robusto mesmo em casos complicados, e o método se mostrou capaz de mapear grande parte das interfaces presentes nos dados testados. Já os atributos de visualização são construídos através de uma função auto-adaptável que usa a informação da vizinhança dos grupos sendo capaz de enfatizar as regiões do dado de entrada onde existam falhas ou outras descontinuidades. Nós aplicamos essas metodologias a dados reais. Os resultados obtidos evidenciam a capacidade dos métodos de mapear mesmo interfaces severamente interrompidas por falhas sísmicas, domos de sal e outras descontinuidades, além de produzirmos atributos de visualização que se mostraram bastante úteis no processo de identificação de descontinuidades presentes nos dados.
We present clustering-based methodologies used to process 3D seismic data. It firstly replaces the volume voxels by corresponding feature samples representing the local behavior in the seismic trace. After this step samples are used as entries to clustering procedures, and the resulting cluster maps are used to create a new representation of the original volume data. This strategy finds the global structure of the seismic signal. It strongly reduces the impact of noise and small disagreements found in the voxels of the entry volume. These clustered versions of the input seismic data can then be used in two different applications: to map 3D horizons automatically and to produce visual attribute volumes where seismic faults and any discontinuities present in the data are highlighted. Concerning the horizon mapping, as the method does not use any lateral similarity measure to organize horizon voxels into clusters, the methodology is very robust when mapping difficult cases. It is capable of mapping a great portion of the seismic interfaces present in the data. In the case of the visualization attribute, it is constructed by applying an auto-adaptable function that uses the voxel neighboring information through a specific measurement that globally highlights the fault regions and other discontinuities present in the original volume. We apply the methodologies to real seismic data, mapping even seismic horizons severely interrupted by various discontinuities and presenting visualization attributes where discontinuities are adequately highlighted.

APA, Harvard, Vancouver, ISO, and other styles

34

Murugiah, S. "Bayesian nonparametric clustering based on Dirichlet processes." Thesis, University College London (University of London), 2010. http://discovery.ucl.ac.uk/20467/.

Full text

Abstract:

Following a review of some traditional methods of clustering, we review the Bayesian nonparametric framework for modelling object attribute differences. We focus on Dirichlet Process (DP) mixture models, in which the observed clusters in any particular data set are not viewed as belonging to a fixed set of clusters but rather as representatives of a latent structure in which clusters belong to one of a potentially infinite number of clusters. As more information about attribute differences is revealed, the number of inferred clusters is allowed to grow. We begin by studying DP mixture models for normal data and show how to adapt one of the most widely used conditional methods for computation to improve sampling efficiency. This scheme is then generalized, followed by an application to discrete data. The DP’s dispersion parameter is a critical parameter controlling the number of clusters. We propose a framework for the specification of the hyperparameters for this parameter, using a percentile based method. This research was motivated by the analysis of product trials at the magazine Which?, where brand attributes are usually assessed on a 5-point preference scale by experts or by a random selection of Which? subscribers. We conclude with a simulation study, where we replicate some of the standard trials at Which? and compare the performance of our DP mixture models against various other popular frequentist and Bayesian multiple comparison routines adapted for clustering.

APA, Harvard, Vancouver, ISO, and other styles

35

Coretto, Pietro. "The noise component in model-based clustering." Thesis, University College London (University of London), 2008. http://discovery.ucl.ac.uk/1445219/.

Full text

Abstract:

Model-based cluster analysis is a statistical tool used to investigate group-structures in data. Finite mixtures of Gaussian distributions are a popular device used to model elliptical shaped clusters. Estimation of mixtures of Gaussians is usually based on the maximum likelihood method. However, for a wide class of finite mixtures, including Gaussians, maximum likelihood estimates are not robust. This implies that a small proportion of outliers in the data could lead to poor estimates and clustering. One way to deal with this is to add a "noise component", i.e. a mixture component that models the outliers. In this thesis we explore this approach based on three contributions. First, Fraley and Raftery (1993) propose a Gaussian mixture model with the addition of a uniform noise component with support on the data range. We generalize this approach by introducing a model, which is a finite mixture of location-scale distributions mixed with a finite number of uniforms supported on disjoint subsets of the data range. We study identifiability and maximum likelihood estimation, and provide a computational procedure based on the EM algorithm. Second, Hennig (2004) proposed a sort of model in which the noise component is represented by a fixed improper density, which is a constant on the real line. He shows that the resulting estimates are robust to extreme outliers. We define a maximum likelihood type estimator for such a model and study its asymptotic behaviour. We also provide a method for choosing the improper constant density, and a computational procedure based on the EM algorithm. The third contribution is an extensive simulation study in which we measure the performance of the previous two methods and certain other robust method ologies proposed in the literature.

APA, Harvard, Vancouver, ISO, and other styles

36

Akula, Ravi Kiran. "Botnet Detection Using Graph Based Feature Clustering." Thesis, Mississippi State University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10751733.

Full text

Abstract:

Detecting botnets in a network is crucial because bot-activities impact numerous areas such as security, finance, health care, and law enforcement. Most existing rule and flow-based detection methods may not be capable of detecting bot-activities in an efficient manner. Hence, designing a robust botnet-detection method is of high significance. In this study, we propose a botnet-detection methodology based on graph-based features. Self-Organizing Map is applied to establish the clusters of nodes in the network based on these features. Our method is capable of isolating bots in small clusters while containing most normal nodes in the big-clusters. A filtering procedure is also developed to further enhance the algorithm efficiency by removing inactive nodes from bot detection. The methodology is verified using real-world CTU-13 and ISCX botnet datasets and benchmarked against classification-based detection methods. The results show that our proposed method can efficiently detect the bots despite their varying behaviors.

APA, Harvard, Vancouver, ISO, and other styles

37

Kim, Yeongwoo. "Dynamic GAN-based Clustering in Federated Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285576.

Full text

Abstract:

As the era of Industry 4.0 arises, the number of devices that are connectedto a network has increased. The devices continuously generate data that hasvarious information from power consumption to the configuration of thedevices. Since the data have the raw information about each local node inthe network, the manipulation of the information brings a potential to benefitthe network with different methods. However, due to the large amount ofnon-IID data generated in each node, manual operations to process the dataand tune the methods became challenging. To overcome the challenge, therehave been attempts to apply automated methods to build accurate machinelearning models by a subset of collected data or cluster network nodes byleveraging clustering algorithms and using machine learning models withineach cluster. However, the conventional clustering algorithms are imperfectin a distributed and dynamic network due to risk of data privacy, the nondynamicclusters, and the fixed number of clusters. These limitations ofthe clustering algorithms degrade the performance of the machine learningmodels because the clusters may become obsolete over time. Therefore, thisthesis proposes a three-phase clustering algorithm in dynamic environmentsby leveraging 1) GAN-based clustering, 2) cluster calibration, and 3) divisiveclustering in federated learning. GAN-based clustering preserves data becauseit eliminates the necessity of sharing raw data in a network to create clusters.Cluster calibration adds dynamics to fixed clusters by continuously updatingclusters and benefits methods that manage the network. Moreover, the divisiveclustering explores the different number of clusters by iteratively selectingand dividing a cluster into multiple clusters. As a result, we create clustersfor dynamic environments and improve the performance of machine learningmodels within each cluster.
ett nätverk ökat. Enheterna genererar kontinuerligt data som har varierandeinformation, från strömförbrukning till konfigurationen av enheterna. Eftersomdatan innehåller den råa informationen om varje lokal nod i nätverket germanipulation av informationen potential att gynna nätverket med olika metoder.På grund av den stora mängden data, och dess egenskap av att vara icke-o.l.f.,som genereras i varje nod blir manuella operationer för att bearbeta data ochjustera metoderna utmanande. För att hantera utmaningen finns försök med attanvända automatiserade metoder för att bygga precisa maskininlärningsmodellermed hjälp av en mindre mängd insamlad data eller att gruppera nodergenom att utnyttja klustringsalgoritmer och använda maskininlärningsmodellerinom varje kluster. De konventionella klustringsalgoritmerna är emellertidofullkomliga i ett distribuerat och dynamiskt nätverk på grund av risken fördataskydd, de icke-dynamiska klusterna och det fasta antalet kluster. Dessabegränsningar av klustringsalgoritmerna försämrar maskininlärningsmodellernasprestanda eftersom klustren kan bli föråldrade med tiden. Därför föreslårdenna avhandling en trefasklustringsalgoritm i dynamiska miljöer genom attutnyttja 1) GAN-baserad klustring, 2) klusterkalibrering och 3) klyvning avkluster i federerad inlärning. GAN-baserade klustring bevarar dataintegriteteneftersom det eliminerar behovet av att dela rådata i ett nätverk för att skapakluster. Klusterkalibrering lägger till dynamik i klustringen genom att kontinuerligtuppdatera kluster och fördelar metoder som hanterar nätverket. Dessutomdelar den klövlande klustringen olika antal kluster genom att iterativt välja ochdela ett kluster i flera kluster. Som ett resultat skapar vi kluster för dynamiskamiljöer och förbättrar prestandan hos maskininlärningsmodeller inom varjekluster.

APA, Harvard, Vancouver, ISO, and other styles

38

Zhang, Kai. "Kernel-based clustering and low rank approximation /." View abstract or full-text, 2008. http://library.ust.hk/cgi/db/thesis.pl?CSED%202008%20ZHANG.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

McClelland, Robyn L. "Regression based variable clustering for data reduction /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/9611.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Davis, Aaron Samuel. "Bisecting Document Clustering Using Model-Based Methods /." Diss., CLICK HERE for online access, 2010. http://contentdm.lib.byu.edu/ETD/image/etd3332.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

"Modeling multivariate financial time series based on correlation clustering." 2008. http://library.cuhk.edu.hk/record=b5896838.

Full text

Abstract:

Zhou, Tu.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2008.
Includes bibliographical references (leaves 61-70).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.0
Chapter 1.1 --- Motivation and Objective --- p.0
Chapter 1.2 --- Major Contribution --- p.2
Chapter 1.3 --- Thesis Organization --- p.4
Chapter 2 --- Measurement of Relationship between financial time series --- p.5
Chapter ´ب2.1 --- Linear Correlation --- p.5
Chapter 2.1.1 --- Pearson Correlation Coefficient --- p.6
Chapter 2.1.2 --- Rank Correlation --- p.6
Chapter 2.2 --- Mutual Information --- p.7
Chapter 2.2.1 --- Approaches of Mutual Information Estimation --- p.10
Chapter 2.3 --- Copula --- p.12
Chapter 2.4 --- Analysis from Experimental Data --- p.14
Chapter 2.4.1 --- Experiment 1: Nonlinearity --- p.14
Chapter 2.4.2 --- Experiment 2: Sensitivity of Outliers --- p.16
Chapter 2.4.3 --- Experiment 3: Transformation Invariance --- p.20
Chapter 2.5 --- Chapter Summary --- p.23
Chapter 3 --- Clustered Dynamic Conditional Correlation Model --- p.26
Chapter 3.1 --- Background Review --- p.26
Chapter 3.1.1 --- GARCH Model --- p.26
Chapter 3.1.2 --- Multivariate GARCH model --- p.29
Chapter 3.2 --- DCC Multivariate GARCH Models --- p.31
Chapter 3.2.1 --- DCC GARCH Model --- p.31
Chapter 3.2.2 --- Generalized DCC GARCH Model --- p.32
Chapter 3.2.3 --- Block-DCC GARCH Model --- p.32
Chapter 3.3 --- Clustered DCC GARCH Model --- p.34
Chapter 3.3.1 --- Minimum Distance Estimation (MDE) --- p.36
Chapter 3.3.2 --- Clustered DCC (CDCC) based on MDE --- p.37
Chapter 3.4 --- Clustering Method Selection --- p.40
Chapter 3.5 --- Model Estimation and Testing Method --- p.42
Chapter 3.5.1 --- Maximum Likelihood Estimation --- p.42
Chapter 3.5.2 --- Box-Pierce Statistic Test --- p.44
Chapter 3.6 --- Chapter Summary --- p.44
Chapter 4 --- Experimental Result and Applications on CDCC --- p.46
Chapter 4.1 --- Model Comparison and Analysis --- p.46
Chapter 4.2 --- Portfolio Selection Application --- p.50
Chapter 4.3 --- Value at Risk Application --- p.52
Chapter 4.4 --- Chapter Summary --- p.55
Chapter 5 --- Conclusion --- p.57
Bibliography --- p.61

APA, Harvard, Vancouver, ISO, and other styles

42

Chen, Lien-Chin, and 陳連進. "A Correlation-Based Approach for Validating Gene Expression Clustering." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/r32u5a.

Full text

Abstract:

碩士
國立成功大學
資訊工程學系碩博士班
90
This research explores various correlation-based clustering validation methods that are suitable for the gene expression analysis. In biological analysis, the clustering algorithms are often used first to partition the genes into groups exhibiting similar patterns of variation in expression level, then the clustering validation methods are applied to evaluate the validity of the clustering results. However, most of similarity measurements used in existing clustering analysis belong to the distance-based category. In fact, a biologist aims to cluster together genes that have similar expression tendency instead of same expression values. This motivates the use of correlation-based clustering and validation indices in this study. In this thesis, an automatic clustering validation system was presented to guide the user to choose the suitable validation index in cluster analysis. We developed a volumetric-clouds type clusters generator to synthesize various datasets, and a number of correlation-based validation indices were evaluated for measuring the quality of clustering results. Hence, the system can suggest the best validation index for different types of datasets given by users effectively.

APA, Harvard, Vancouver, ISO, and other styles

43

Wang, Qian-Hao, and 王千豪. "Document Clustering based on Approximate Word Pattern Matching and Correlation of Co-occurrence." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/mf3227.

Full text

Abstract:

碩士
大同大學
資訊經營學系(所)
95
Because of users often searching related text on Internet to read or browse, this research aims at rapidly and exactly grouping a large number of text by thematic document clustering for users to efficiently absorb them during reading and convert them into really useful information. This research includes feature extraction, feature strength measurement, document-feature vector space modeling, and clustering analysis on document-document space. Feature extraction is based on the approximate word pattern matching, whose strength is evaluated by the correlation of co-occurrence involved in approximation tolerance, the distance between components of the pattern., Then we expand the tf-idf concept of vector space model from Information Retrieval to establish a document-feature vector space model by correlation of co-occurrence and idf (pwf-idf or pa-idf) In order to perform effect clustering, the document-document vector space is generated by the similarity between all pairs of documents and the similarity is calculated from document-feature vector space model. Finally, a simple and effective clustering method by recursive merging data with high similarity is presented. Through the experimental analysis, the result of our presented research method is better than that of Yang & Yu : “With the word bi-gram as its feature and by word clustering first, which will lead the documents containing them grouping together called concept clusters, and then to combine these concept clusters with high document repetition to become the final document clustering”. This research verifies that the approximate word pattern matching can extract more common features from documents and the proposed document clustering model also can solve the error propagation resulting from multiple clustering.

APA, Harvard, Vancouver, ISO, and other styles

44

Chiu, Yi-Wen, and 邱伊文. "The Study of the Correlation on the Ability of Othello and Reading-habits through Bee-based Clustering Analysis." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/27258486255908780869.

Full text

Abstract:

碩士
嶺東科技大學
高階主管企管碩士在職專班
102
This study is based on Bee-based Clustering(BBC)to analyze the questionnaires about the correlation from students’ reading habits and Othello. More specifically, the relationship between the outcomes in understanding the relationship between reading habits and reasoning ability of elementary school children. The research can provide a good reference to teachers and staff in education administration. The detail of this study incorporate the 5 th grade elementary school students of a series of questionnaires. There are 163 samples and 161 of effective samples. We also collect the records of school subjects of this 161 students. The results present that the ability in mathematics can produce a strong correlation in winning rates of Othello. It also presents that the preference of reading has some linked to good winning rates of Othello.

APA, Harvard, Vancouver, ISO, and other styles

45

Zimek, Arthur [Verfasser]. "Correlation clustering / vorgelegt von Arthur Zimek." 2008. http://d-nb.info/989874494/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

(10725786), James Michael Amstutz. "Cluster-Based Analysis Of Retinitis Pigmentosa Candidate Modifiers Using Drosophila Eye Size And Gene Expression Data." Thesis, 2021.

Find full text

Abstract:

The goal of this thesis is to algorithmically identify candidate modifiers for retinitis pigmentosa (RP) to help improve therapy and predictions for this genetic disorder that may lead to a complete loss of vision. A current research by (Chow et al., 2016) focused on the genetic contributors to RP by trying to recognize a correlation between genetic modifiers and phenotypic variation in female Drosophila melanogaster, or fruit flies. In comparison to the genome-wide association analysis carried out in Chow et al.’s research, this study proposes using a K-Means clustering algorithm on RNA expression data to better understand which genes best exhibit characteristics of the RP degenerative model. Validating this algorithm’s effectiveness in identifying suspected genes takes priority over their classification.

This study investigates the linear relationship between Drosophila eye size and genetic expression to gather statistically significant, strongly correlated genes from the clusters with abnormally high or low eye sizes. The clustering algorithm is implemented in the R scripting language, and supplemental information details the steps of this computational process. Running the mean eye size and genetic expression data of 18,140 female Drosophila genes and 171 strains through the proposed algorithm in its four variations helped identify 140 suspected candidate modifiers for retinal degeneration. Although none of the top candidate genes found in this study matched Chow’s candidates, they were all statistically significant and strongly correlated, with several showing links to RP. These results may continue to improve as more of the 140 suspected genes are annotated using identical or comparative approaches.

APA, Harvard, Vancouver, ISO, and other styles

47

Chen, Kuan-Chi, and 陳冠奇. "Combine Fuzzy Clustering and Correlation Coefficient for Medical Image Analysis." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/20086993754837383229.

Full text

Abstract:

碩士
龍華科技大學
資訊管理系碩士班
102
Automatic visual detection techniques have been widely applied in the medical field in recent years. The advancement of image analysis technology has allowed medical images to provide more accurate references for physicians to use while making diagnoses. However, despite the rapid development of image analysis technology, as body structure, organ size and position differs among patients, the image information may cause misjudgments due to human negligence and noise. This study applied image analysis and detection to CT images of patients with heart disease. In the proposed analytical framework, the correlation coefficient were used for detection. The results found that the correlation coefficient could be applied to the enhanced image gray scale, could be applied to the CT color image, and that both reached a good analysis effect. Finally, the neighborhood intuitionistic fuzzy clustering algorithm was integrated for comparison, in order to propose the image types that were suitable for different types of image analyses. This study was expected to provide a more accurate reference for physicians to use in making diagnoses.

APA, Harvard, Vancouver, ISO, and other styles

48

Tsai, Kun-Hsiu, and 蔡坤修. "On the document clustering based on dynamical term clustering." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/74344328487512323211.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊管理系
91
With the rapid growth of the World Wide Web, more and more information is accessible on-line. The explosion of information has resulted in an information overload problem. However, people have no time to read everything and have to decide which information is available. Document clustering is an important technology to solve the information overload problem. In our project, we focus on Chinese document clustering problem. There are still some difficulties which need to be solved up to now, such as the Chinese sentence segmentation problem, high dimensionality problem, and unpredicted cluster number problem. We propose new methods to solve these problems. First step in our Chinese document clustering system is to segment the sentences into meaningful words. With a view to overcome the shortcoming of traditional Chinese sentence segmentation process, we propose a new method combining the segmentation with the thesaurus and the compound words detection. In our experiments, we show that our method results in a better clustering result. During the clustering phase, we design a dynamic term clustering method based on SOM technique. We propose a hierarchical and growing structure of clustering to cluster the term vectors. Different from the traditional clustering method using document vectors, we generate an efficient clustering process and provide a much friendly browsing interface.

APA, Harvard, Vancouver, ISO, and other styles

49

HCWEI and 魏宏全. "Cross-Correlation-based." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/91848855387696610481.

Full text

Abstract:

碩士
國立交通大學
電信工程系
90
In the adaptive acoustic echo cancellation, double-talk will make the echo canceller fail to trace the room impulse response. In this thesis, the cross-correlations between i) the microphone input and the estimated echo, and ii) the microphone input and the AEC error, are used to judge whether double-talk arises. We will derive the theoretical cross-correlations, detection thresholds, and the detection delays. For practical nonstationary speech signals, we propose the Variant threshold method to detect the double-talk more efficiently. To distinguish the echo-path change from double-talk, we also propose a Modified-cross-correlation method. Computer simulations will validate our derivations and proposed methods.

APA, Harvard, Vancouver, ISO, and other styles

50

Shao, Qing. "Estimating the number of clusters in regression clustering /." 2004. http://wwwlib.umi.com/cr/yorku/fullcit?pNQ99236.

Full text

Abstract:

Thesis (Ph.D.)--York University, 2004. Graduate Programme in Mathematics & Statistics.
Typescript. Includes bibliographical references (leaves 114-124). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://wwwlib.umi.com/cr/yorku/fullcit?pNQ99236

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Clustering based on correlation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles