Log in

Relevant bibliographies by topics / Density clustering / Dissertations / Theses

Dissertations / Theses on the topic 'Density clustering'

To see the other types of publications on this topic, follow the link: Density clustering.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Density clustering.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Albarakati, Rayan. "Density Based Data Clustering." CSUSB ScholarWorks, 2015. https://scholarworks.lib.csusb.edu/etd/134.

Full text

Abstract:

Data clustering is a data analysis technique that groups data based on a measure of similarity. When data is well clustered the similarities between the objects in the same group are high, while the similarities between objects in different groups are low. The data clustering technique is widely applied in a variety of areas such as bioinformatics, image segmentation and market research. This project conducted an in-depth study on data clustering with focus on density-based clustering methods. The latest density-based (CFSFDP) algorithm is based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively larger distance from points with higher densities. This method has been examined, experimented, and improved. These methods (KNN-based, Gaussian Kernel-based and Iterative Gaussian Kernel-based) are applied in this project to improve (CFSFDP) density-based clustering. The methods are applied to four milestone datasets and the results are analyzed and compared.

APA, Harvard, Vancouver, ISO, and other styles

2

Erdem, Cosku. "Density Based Clustering Using Mathematical Morphology." Master's thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/12608264/index.pdf.

Full text

Abstract:

Improvements in technology, enables us to store large amounts of data in warehouses. In parallel, the need for processing this vast amount of raw data and translating it into interpretable information also increases. A commonly used solution method for the described problem in data mining is clustering. We propose "
Density Based Clustering Using Mathematical Morphology"
(DBCM) algorithm as an effective clustering method for extracting arbitrary shaped clusters of noisy numerical data in a reasonable time. This algorithm is predicated on the analogy between images and data warehouses. It applies grayscale morphology which is an image processing technique on multidimensional data. In this study we evaluated the performance of the proposed algorithm on both synthetic and real data and observed that the algorithm produces successful and interpretable results with appropriate parameters. In addition, we computed the computational complexity to be linear on number of data points for low dimensional data and exponential on number of dimensions for high dimensional data mainly due to the morphology operations.

APA, Harvard, Vancouver, ISO, and other styles

3

Holzapfel, Klaus. "Density-based clustering in large-scale networks." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=979979943.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Park, Ju-Hyun Dunson David B. "Bayesian density regression and predictor-dependent clustering." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2008. http://dc.lib.unc.edu/u?/etd,1821.

Full text

Abstract:

Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2008.
Title from electronic title page (viewed Dec. 11, 2008). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biostatistics, School of Public Health." Discipline: Biostatistics; Department/School: Public Health.

APA, Harvard, Vancouver, ISO, and other styles

5

Kröger, Peer. "Coping With New Challengens for Density-Based Clustering." Diss., lmu, 2004. http://nbn-resolving.de/urn:nbn:de:bvb:19-23966.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Mai, Son. "Density-based algorithms for active and anytime clustering." Diss., Ludwig-Maximilians-Universität München, 2014. http://nbn-resolving.de/urn:nbn:de:bvb:19-175337.

Full text

Abstract:

Data intensive applications like biology, medicine, and neuroscience require effective and efficient data mining technologies. Advanced data acquisition methods produce a constantly increasing volume and complexity. As a consequence, the need of new data mining technologies to deal with complex data has emerged during the last decades. In this thesis, we focus on the data mining task of clustering in which objects are separated in different groups (clusters) such that objects inside a cluster are more similar than objects in different clusters. Particularly, we consider density-based clustering algorithms and their applications in biomedicine. The core idea of the density-based clustering algorithm DBSCAN is that each object within a cluster must have a certain number of other objects inside its neighborhood. Compared with other clustering algorithms, DBSCAN has many attractive benefits, e.g., it can detect clusters with arbitrary shape and is robust to outliers, etc. Thus, DBSCAN has attracted a lot of research interest during the last decades with many extensions and applications. In the first part of this thesis, we aim at developing new algorithms based on the DBSCAN paradigm to deal with the new challenges of complex data, particularly expensive distance measures and incomplete availability of the distance matrix. Like many other clustering algorithms, DBSCAN suffers from poor performance when facing expensive distance measures for complex data. To tackle this problem, we propose a new algorithm based on the DBSCAN paradigm, called Anytime Density-based Clustering (A-DBSCAN), that works in an anytime scheme: in contrast to the original batch scheme of DBSCAN, the algorithm A-DBSCAN first produces a quick approximation of the clustering result and then continuously refines the result during the further run. Experts can interrupt the algorithm, examine the results, and choose between (1) stopping the algorithm at any time whenever they are satisfied with the result to save runtime and (2) continuing the algorithm to achieve better results. Such kind of anytime scheme has been proven in the literature as a very useful technique when dealing with time consuming problems. We also introduced an extended version of A-DBSCAN called A-DBSCAN-XS which is more efficient and effective than A-DBSCAN when dealing with expensive distance measures. Since DBSCAN relies on the cardinality of the neighborhood of objects, it requires the full distance matrix to perform. For complex data, these distances are usually expensive, time consuming or even impossible to acquire due to high cost, high time complexity, noisy and missing data, etc. Motivated by these potential difficulties of acquiring the distances among objects, we propose another approach for DBSCAN, called Active Density-based Clustering (Act-DBSCAN). Given a budget limitation B, Act-DBSCAN is only allowed to use up to B pairwise distances ideally to produce the same result as if it has the entire distance matrix at hand. The general idea of Act-DBSCAN is that it actively selects the most promising pairs of objects to calculate the distances between them and tries to approximate as much as possible the desired clustering result with each distance calculation. This scheme provides an efficient way to reduce the total cost needed to perform the clustering. Thus it limits the potential weakness of DBSCAN when dealing with the distance sparseness problem of complex data. As a fundamental data clustering algorithm, density-based clustering has many applications in diverse fields. In the second part of this thesis, we focus on an application of density-based clustering in neuroscience: the segmentation of the white matter fiber tracts in human brain acquired from Diffusion Tensor Imaging (DTI). We propose a model to evaluate the similarity between two fibers as a combination of structural similarity and connectivity-related similarity of fiber tracts. Various distance measure techniques from fields like time-sequence mining are adapted to calculate the structural similarity of fibers. Density-based clustering is used as the segmentation algorithm. We show how A-DBSCAN and A-DBSCAN-XS are used as novel solutions for the segmentation of massive fiber datasets and provide unique features to assist experts during the fiber segmentation process.
Datenintensive Anwendungen wie Biologie, Medizin und Neurowissenschaften erfordern effektive und effiziente Data-Mining-Technologien. Erweiterte Methoden der Datenerfassung erzeugen stetig wachsende Datenmengen und Komplexit\"at. In den letzten Jahrzehnten hat sich daher ein Bedarf an neuen Data-Mining-Technologien f\"ur komplexe Daten ergeben. In dieser Arbeit konzentrieren wir uns auf die Data-Mining-Aufgabe des Clusterings, in der Objekte in verschiedenen Gruppen (Cluster) getrennt werden, so dass Objekte in einem Cluster untereinander viel \"ahnlicher sind als Objekte in verschiedenen Clustern. Insbesondere betrachten wir dichtebasierte Clustering-Algorithmen und ihre Anwendungen in der Biomedizin. Der Kerngedanke des dichtebasierten Clustering-Algorithmus DBSCAN ist, dass jedes Objekt in einem Cluster eine bestimmte Anzahl von anderen Objekten in seiner Nachbarschaft haben muss. Im Vergleich mit anderen Clustering-Algorithmen hat DBSCAN viele attraktive Vorteile, zum Beispiel kann es Cluster mit beliebiger Form erkennen und ist robust gegen\"uber Ausrei{\ss}ern. So hat DBSCAN in den letzten Jahrzehnten gro{\ss}es Forschungsinteresse mit vielen Erweiterungen und Anwendungen auf sich gezogen. Im ersten Teil dieser Arbeit wollen wir auf die Entwicklung neuer Algorithmen eingehen, die auf dem DBSCAN Paradigma basieren, um mit den neuen Herausforderungen der komplexen Daten, insbesondere teurer Abstandsma{\ss}e und unvollst\"andiger Verf\"ugbarkeit der Distanzmatrix umzugehen. Wie viele andere Clustering-Algorithmen leidet DBSCAN an schlechter Per- formanz, wenn es teuren Abstandsma{\ss}en f\"ur komplexe Daten gegen\"uber steht. Um dieses Problem zu l\"osen, schlagen wir einen neuen Algorithmus vor, der auf dem DBSCAN Paradigma basiert, genannt Anytime Density-based Clustering (A-DBSCAN), der mit einem Anytime Schema funktioniert. Im Gegensatz zu dem urspr\"unglichen Schema DBSCAN, erzeugt der Algorithmus A-DBSCAN zuerst eine schnelle Ann\"aherung des Clusterings-Ergebnisses und verfeinert dann kontinuierlich das Ergebnis im weiteren Verlauf. Experten k\"onnen den Algorithmus unterbrechen, die Ergebnisse pr\"ufen und w\"ahlen zwischen (1) Anhalten des Algorithmus zu jeder Zeit, wann immer sie mit dem Ergebnis zufrieden sind, um Laufzeit sparen und (2) Fortsetzen des Algorithmus, um bessere Ergebnisse zu erzielen. Eine solche Art eines "Anytime Schemas" ist in der Literatur als eine sehr n\"utzliche Technik erprobt, wenn zeitaufwendige Problemen anfallen. Wir stellen auch eine erweiterte Version von A-DBSCAN als A-DBSCAN-XS vor, die effizienter und effektiver als A-DBSCAN beim Umgang mit teuren Abstandsma{\ss}en ist. Da DBSCAN auf der Kardinalit\"at der Nachbarschaftsobjekte beruht, ist es notwendig, die volle Distanzmatrix auszurechen. F\"ur komplexe Daten sind diese Distanzen in der Regel teuer, zeitaufwendig oder sogar unm\"oglich zu errechnen, aufgrund der hohen Kosten, einer hohen Zeitkomplexit\"at oder verrauschten und fehlende Daten. Motiviert durch diese m\"oglichen Schwierigkeiten der Berechnung von Entfernungen zwischen Objekten, schlagen wir einen anderen Ansatz f\"ur DBSCAN vor, namentlich Active Density-based Clustering (Act-DBSCAN). Bei einer Budgetbegrenzung B, darf Act-DBSCAN nur bis zu B ideale paarweise Distanzen verwenden, um das gleiche Ergebnis zu produzieren, wie wenn es die gesamte Distanzmatrix zur Hand h\"atte. Die allgemeine Idee von Act-DBSCAN ist, dass es aktiv die erfolgversprechendsten Paare von Objekten w\"ahlt, um die Abst\"ande zwischen ihnen zu berechnen, und versucht, sich so viel wie m\"oglich dem gew\"unschten Clustering mit jeder Abstandsberechnung zu n\"ahern. Dieses Schema bietet eine effiziente M\"oglichkeit, die Gesamtkosten der Durchf\"uhrung des Clusterings zu reduzieren. So schr\"ankt sie die potenzielle Schw\"ache des DBSCAN beim Umgang mit dem Distance Sparseness Problem von komplexen Daten ein. Als fundamentaler Clustering-Algorithmus, hat dichte-basiertes Clustering viele Anwendungen in den unterschiedlichen Bereichen. Im zweiten Teil dieser Arbeit konzentrieren wir uns auf eine Anwendung des dichte-basierten Clusterings in den Neurowissenschaften: Die Segmentierung der wei{\ss}en Substanz bei Faserbahnen im menschlichen Gehirn, die vom Diffusion Tensor Imaging (DTI) erfasst werden. Wir schlagen ein Modell vor, um die \"Ahnlichkeit zwischen zwei Fasern als einer Kombination von struktureller und konnektivit\"atsbezogener \"Ahnlichkeit von Faserbahnen zu beurteilen. Verschiedene Abstandsma{\ss}e aus Bereichen wie dem Time-Sequence Mining werden angepasst, um die strukturelle \"Ahnlichkeit von Fasern zu berechnen. Dichte-basiertes Clustering wird als Segmentierungsalgorithmus verwendet. Wir zeigen, wie A-DBSCAN und A-DBSCAN-XS als neuartige L\"osungen f\"ur die Segmentierung von sehr gro{\ss}en Faserdatens\"atzen verwendet werden, und bieten innovative Funktionen, um Experten w\"ahrend des Fasersegmentierungsprozesses zu unterst\"utzen.

APA, Harvard, Vancouver, ISO, and other styles

7

Dixit, Siddharth. "Density Based Clustering using Mutual K-Nearest Neighbors." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447690719.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Tuhin, RASHEDUL AMIN. "Securing GNSS Receivers with a Density-based Clustering Algorithm." Thesis, KTH, Kommunikationsnät, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-182117.

Full text

Abstract:

Global Navigation Satellite Systems (GNSS) is in widespread use around the world for numerous purposes. Even though it was first developed for military purposes, nowadays, the civilian use has surpassed it by far. It has evolved to its finest state in recent days and still being developed further towards pinpoint accuracy. With all the improvements, several vulnerabilities have been discovered by researchers and exploited by the attackers. Several countermeasures have been and still being implemented to secure the GNSS. Studies show that GNSS-based receivers are still vulnerable to a very fundamentally simple, yet effective, attack; known as the replay attack. The replay attack is particularly harmful since the attacker could make the receiver calculate an inaccurate position, without even breaking the encryption or without employing any sophisticated technique. The Multiple Combinations of Satellites and Systems (MCSS) test is a powerful test against replay attacks on GNSS. However, detecting and identifying multiple attacking signals and determining the correct position of the receiver simultaneously remain as a challenge. In this study, after the implementation of MCSS test, a mechanism to detect the attacker controlled signals has been demonstrated. Furthermore, applying a clustering algorithm on the product of MCSS test, a method of correctly determining the position, nullifying the adversarial effects has also been presented in this report.

APA, Harvard, Vancouver, ISO, and other styles

9

Braune, Christian [Verfasser]. "Skeleton-based validation for density-based clustering / Christian Braune." Magdeburg : Universitätsbibliothek Otto-von-Guericke-Universität, 2018. http://d-nb.info/1220035653/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lilje, Per Vidar Barth. "Large-scale density and velocity fields in the Universe." Thesis, University of Cambridge, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.254245.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Hinneburg, Alexander. "Density based clustering in large databases using projections and visualizations." [S.l. : s.n.], 2002. http://deposit.ddb.de/cgi-bin/dokserv?idn=967390583.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Bugrien, Jamal B. "Robust approaches to clustering based on density estimation and projection." Thesis, University of Leeds, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.418939.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Santiago, Rafael de. "Efficient modularity density heuristics in graph clustering and their applications." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2017. http://hdl.handle.net/10183/164066.

Full text

Abstract:

Modularity Density Maximization is a graph clustering problem which avoids the resolution limit degeneracy of the Modularity Maximization problem. This thesis aims at solving larger instances than current Modularity Density heuristics do, and show how close the obtained solutions are to the expected clustering. Three main contributions arise from this objective. The first one is about the theoretical contributions about properties of Modularity Density based prioritizers. The second one is the development of eight Modularity Density Maximization heuristics. Our heuristics are compared with optimal results from the literature, and with GAOD, iMeme-Net, HAIN, BMD- heuristics. Our results are also compared with CNM and Louvain which are heuristics for Modularity Maximization that solve instances with thousands of nodes. The tests were carried out by using graphs from the “Stanford Large Network Dataset Collection”. The experiments have shown that our eight heuristics found solutions for graphs with hundreds of thousands of nodes. Our results have also shown that five of our heuristics surpassed the current state-of-the-art Modularity Density Maximization heuristic solvers for large graphs. A third contribution is the proposal of six column generation methods. These methods use exact and heuristic auxiliary solvers and an initial variable generator. Comparisons among our proposed column generations and state-of-the-art algorithms were also carried out. The results showed that: (i) two of our methods surpassed the state-of-the-art algorithms in terms of time, and (ii) our methods proved the optimal value for larger instances than current approaches can tackle. Our results suggest clear improvements to the state-of-the-art results for the Modularity Density Maximization problem.

APA, Harvard, Vancouver, ISO, and other styles

14

Eldridge, Justin Eldridge. "Clustering Consistently." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1512070374903249.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Al-Azab, Fadwa Gamal Mohammed. "An Improved Density-Based Clustering Algorithm Using Gravity and Aging Approaches." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/31994.

Full text

Abstract:

Density-based clustering is one of the well-known algorithms focusing on grouping samples according to their densities. In the existing density-based clustering algorithms, samples are clustered according to the total number of points within the radius of the defined dense region. This method of determining density, however, provides little knowledge about the similarities among points. Additionally, they are not flexible enough to deal with dynamic data that changes over time. The current study addresses these challenges by proposing a new approach that incorporates new measures to evaluate the attributes similarities while clustering incoming samples rather than considering only the total number of points within a radius. The new approach is developed based on the notion of Gravity where incoming samples are clustered according to the force of their neighbouring samples. The Mass (density) of a cluster is measured using various approaches including the number of neighbouring samples and Silhouette measure. Then, the neighbouring sample with the highest force is the one that pulls in the new incoming sample to be part of that cluster. Taking into account the attribute similarities of points provides more information by accurately defining the dense regions around the incoming samples. Also, it determines the best neighbourhood to which the new sample belongs. In addition, the proposed algorithm introduces a new approach to utilize the memory efficiently. It forms clusters with different shapes over time when dealing with dynamic data. This approach, called Aging, enables the proposed algorithm to utilize the memory efficiently by removing points that are aged if they do not participate in clustering incoming samples, and consequently, changing the shapes of the clusters incrementally. Four experiments are conducted in this study to evaluate the performance of the proposed algorithm. The performance and effectiveness of the proposed algorithm are validated on a synthetic dataset (to visualize the changes of the clusters’ shapes over time), as well as real datasets. The experimental results confirm that the proposed algorithm is improved in terms of the performance measures including Dunn Index and SD Index. The experimental results also demonstrate that the proposed algorithm utilizes less memory, with the ability to form clusters with arbitrary shapes that are changeable over time.

APA, Harvard, Vancouver, ISO, and other styles

16

Kannamareddy, Aruna Sai. "Density and partition based clustering on massive threshold bounded data sets." Kansas State University, 2017. http://hdl.handle.net/2097/35467.

Full text

Abstract:

Master of Science
Department of Computing and Information Sciences
William H. Hsu
The project explores the possibility of increasing efficiency in the clusters formed out of massive data sets which are formed using threshold blocking algorithm. Clusters thus formed are denser and qualitative. Clusters that are formed out of individual clustering algorithms alone, do not necessarily eliminate outliers and the clusters generated can be complex, or improperly distributed over the data set. The threshold blocking algorithm, a current research paper from Michael Higgins of Statistics Department on other hand, in comparison with existing algorithms performs better in forming the dense and distinctive units with predefined threshold. Developing a hybridized algorithm by implementing the existing clustering algorithms to re-cluster these units thus formed is part of this project. Clustering on the seeds thus formed from threshold blocking Algorithm, eases the task of clustering to the existing algorithm by eliminating the overhead of worrying about the outliers. Also, the clusters thus generated are more representative of the whole. Also, since the threshold blocking algorithm is proven to be fast and efficient, we now can predict a lot more decisions from large data sets in less time. Predicting the similar songs from Million Song Data Set using such a hybridized algorithm is considered as the data set for the evaluation of this goal.

APA, Harvard, Vancouver, ISO, and other styles

17

Teodoro, Luís Filipe Alves. "The density and velocity fields of the local universe." Thesis, Durham University, 1999. http://etheses.dur.ac.uk/4550/.

Full text

Abstract:

We present two self-consistent non-parametric models of the local cosmic velocity field based on the density distribution in the PSCz redshift survey of IRAS galaxies. Two independent methods have been applied, both based on the assumptions of gravitational instability and linear biasing. They give remarkably similar results, with no evidence of systematic differences and an r.m.s discrepancy of only ~ 70 kms(^-1) in each Cartesian velocity component. These uncertainties are consistent with a detailed independent error analysis carried out on mock PSCz catalogues constructed from TV-body simulations. The denser sampling provided by the PSCz survey compared to previous IRAS galaxy surveys allows us to reconstruct the velocity field out to larger distances. The most striking feature of the model velocity field is a coherent large-scale streaming motion along a basehne connecting Perseus-Pisces, the Local Supercluster, the Great Attractor, and the Shapley Concentration. We find no evidence for back-infall onto the Great Attractor. Instead, material behind and around the Great Attractor is inferred to be streaming towards the Shapley Concentration, aided by the expansion of two large neighbouring un- derdense regions. The PSCi model velocities compare well with those predicted from the 1.2-Jy redshift survey of IRAS galaxies and, perhaps surprisingly, with those predicted from the distribution of Abell/ACO clusters, out to 140 h(^-1)Mpc. Comparison of the real-space density fields (or, alternatively, the peculiar velocity fields) inferred from the PSCz and cluster catalogues gives a relative (linear) bias parameter between clusters and IRAS galaxies of b(_c) = 4.4 ± 0.6. In addition, we compare the cumulative bulk flows predicted from the PSCz gravity field with those measured from the MarkIII and SFI catalogues of peculiar velocities. A conservative estimate of β = Ω(_0)(^0.6)/b, where b is the bias parameter for IRAS galaxies, gives β= 0.76 ± 0.13 (1-σ), in agreement with other recent determinations. Finally, we perform a detailed comparison of the IRAS PSCz and 1.2-Jy spherical harmonic coefficients of the density and velocity fields in redshift space. Both the monopole terms of the density and velocity fields predicted from the surveys show some inconsistencies. The mismatch in the velocity monopole terms is resolved by masking the 1.2-Jy survey with the PSCz mask and using the galaxies within the PSCz survey for fluxes larger than 1.2 Jy. Davis, Nusser and Willick (1996) have found a discrepancy between the IRAS 1.2-Jy survey gravity field and the MarkIII peculiar velocity field. We conclude that the use of the deeper IRAS PSCz catalogue cannot alone resolve this mismatch.

APA, Harvard, Vancouver, ISO, and other styles

18

Guan, C. "Evolutionary and swarm algorithm optimized density-based clustering and classification for data analytics." Thesis, University of Liverpool, 2017. http://livrepository.liverpool.ac.uk/3021212/.

Full text

Abstract:

Clustering is one of the most widely used pattern recognition technologies for data analytics. Density-based clustering is a category of clustering methods which can find arbitrary shaped clusters. A well-known density-based clustering algorithm is Density- Based Spatial Clustering of Applications with Noise (DBSCAN). DBSCAN has three drawbacks: firstly, the parameters for DBSCAN are hard to set; secondly, the number of clusters cannot be controlled by the users; and thirdly, DBSCAN cannot directly be used as a classifier. With addressing the drawbacks of DBSCAN, a novel framework, Evolutionary and Swarm Algorithm optimised Density-based Clustering and Classification (ESA-DCC), is proposed. Evolutionary and Swarm Algorithm (ESA), has been applied in various different research fields regarding optimisation problems, including data analytics. Numerous categories of ESAs have been proposed, such as, Genetic Algorithms (GAs), Particle Swarm Optimization (PSO), Differential Evaluation (DE) and Artificial Bee Colony (ABC). In this thesis, ESA is used to search the best parameters of density-based clustering and classification in the ESA-DCC framework to address the first drawback of DBSCAN. As method to offset the second drawback, four types of fitness functions are defined to enable users to set the number of clusters as input. A supervised fitness function is defined to use the ESA-DCC as a classifier to address the third drawback. Four ESA- DCC methods, GA-DCC, PSO-DCC, DE-DCC and ABC-DCC, are developed. The performance of the ESA-DCC methods is compared with K-means and DBSCAN using ten datasets. The experimental results indicate that the proposed ESA-DCC methods can find the optimised parameters in both supervised and unsupervised contexts. The proposed methods are applied in a product recommender system and image segmentation cases.

APA, Harvard, Vancouver, ISO, and other styles

19

Tour, Samir R. "Parallel Hybrid Clustering using Genetic Programming and Multi-Objective Fitness with Density(PYRAMID)." NSUWorks, 2006. http://nsuworks.nova.edu/gscis_etd/886.

Full text

Abstract:

Clustering is the art of locating patterns in large data sets. It is an active research area that provides value to scientific as well as business applications. There are some challenges that face practical clustering including: identifying clusters of arbitrary shapes, sensitivity to the order of input, dynamic determination of the number of clusters, outlier handling, high dependency on user-defined parameters, processing speed of massive data sets, and the potential to fall into sub-optimal solutions. Many studies that were conducted in the realm of clustering have addressed some of these challenges. This study proposes a new approach, called parallel hybrid clustering using genetic programming and multi-objective fitness with density (PYRAMID), that tackles several of these challenges from a different perspective. PYRAMID employs genetic programming to represent arbitrary cluster shapes and circumvent falling in local optima. It accommodates large data sets and avoids dependency on the order of input by quantizing the data space, i.e., the space on which the data set resides, thus abstracting it into hyper-rectangular cells and creating genetic programming individuals as concatenations of these cells. Thus the cells become the subject of clustering, rather than the data points themselves. PYRAMID also utilizes a density-based multi-objective fitness function to handle outliers. It gathers statistics in a pre-processing step and uses them so not to rely on user-defined parameters. Finally, PYRAMID employs data parallelism in a master-slave model in an attempt to cure the inherent slow performance of evolutionary algorithms and provide speedup. A master processor distributes the clustering data evenly onto multiple slave processors. The slave processors conduct the clustering on their local data sets and report their clustering results back to the master, which consolidates them by merging the partial results into a final clustering solution. This last step also involves determining the number of clusters dynamically and labeling them accordingly. Experiments have demonstrated that, using these features, PYRAMID offers an advantage over some of the existing approaches by tackling the clustering challenges from a different angle.

APA, Harvard, Vancouver, ISO, and other styles

20

Hemerich, Daiane. "Spatio-temporal data mining in palaeogeographic data with a density-based clustering algorithm." Pontifícia Universidade Católica do Rio Grande do Sul, 2014. http://hdl.handle.net/10923/5929.

Full text

Abstract:

Made available in DSpace on 2014-06-06T02:01:22Z (GMT). No. of bitstreams: 1 000458539-Texto+Completo-0.pdf: 3705446 bytes, checksum: de3d802acba0f10f03298ee0277b51b1 (MD5) Previous issue date: 2014
The usefulness of data mining and the process of Knowledge Discovery in Databases (KDD) has increased its importance as grows the volume of data stored in large repositories. A promising area for knowledge discovery concerns oil prospection, in which data used differ both from traditional and geographical data. In palaeogeographic data, temporal dimension is treated according to the geologic time scale, while the spatial dimension is related to georeferenced data, i. e. , latitudes and longitudes on Earth’s surface. This approach differs from that presented by spatio-temporal data mining algorithms found in literature, arising the need to evolve the existing ones to the context of this research. This work presents the development of a solution to employ a density-based spatio-temporal algorithm for mining palaeogeographic data on the Earth’s surface. An evolved version of the ST-DBSCAN algorithm was implemented in Java language making use of Weka API, where improvements were carried out in order to allow the data mining algorithm to solve a variety of research problems identified. A set of experiments that validate the proposed implementations on the algorithm are presented in this work. The experiments show that the solution developed allow palaeogeographic data mining by applying appropriate formulas for calculating distances over the Earth’s surface and, at the same time, treating the temporal dimension according to the geologic time scale.
O uso da mineração de dados e do processo de descoberta de conhecimento em banco de dados (Knowledge Discovery in Databases (KDD)) vem crescendo em sua importância conforme cresce o volume de dados armazenados em grandes repositórios. Uma área promissora para descoberta do conhecimento diz respeito à prospecção de petróleo, onde os dados usados diferem tanto de dados tradicionais como de dados geográficos. Nesses dados, a dimensão temporal é tratada de acordo com a escala de tempo geológico, enquanto a escala espacial é relacionada a dados georeferenciados, ou seja, latitudes e longitudes projetadas na superfície terrestre. Esta abordagem difere da adotada em algoritmos de mineração espaço-temporal presentes na literatura, surgindo assim a necessidade de evolução dos algoritmos existentes a esse contexto de pesquisa. Este trabalho apresenta o desenvolvimento de uma solução para uso do algoritmo de mineração de dados espaço-temporais baseado em densidade ST-DBSCAN para mineração de dados paleogeográficos na superfície terrestre. O algoritmo foi implementado em linguagem de programação Java utilizando a API Weka, onde aperfeiçoamentos foram feitos a fim de permitir o uso de mineração de dados na solução de problemas de pesquisa identificados. Como resultados, são apresentados conjuntos de experimentos que validam as implementações propostas no algoritmo. Os experimentos demonstram que a solução desenvolvida permite a mineração de dados paleogeográficos com a aplicação de fórmulas apropriadas para cálculo de distâncias sobre a superfície terrestre e, ao mesmo tempo, tratando a dimensão temporal de acordo com a escala de tempo geológico.

APA, Harvard, Vancouver, ISO, and other styles

21

Piekenbrock, Matthew J. "Discovering Intrinsic Points of Interest from Spatial Trajectory Data Sources." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1527160689990512.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Mai, Son Thai [Verfasser], and Christian [Akademischer Betreuer] Böhm. "Density-based algorithms for active and anytime clustering / Son Thai Mai. Betreuer: Christian Böhm." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2014. http://d-nb.info/106000710X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Wang, Haolei. "Using density-based clustering to improve skeleton embedding in the Pinocchio automatic rigging system." Thesis, Kansas State University, 2012. http://hdl.handle.net/2097/15102.

Full text

Abstract:

Master of Science
Department of Computing and Information Sciences
William H. Hsu
Automatic rigging is a targeting approach that takes a 3-D character mesh and an adapted skeleton and automatically embeds it into the mesh. Automating the embedding step provides a savings over traditional character rigging approaches, which require manual guidance, at the cost of occasional errors in recognizing parts of the mesh and aligning bones of the skeleton with it. In this thesis, I examine the problem of reducing such errors in an auto-rigging system and apply a density-based clustering algorithm to correct errors in a particular system, Pinocchio (Baran & Popovic, 2007). I show how the density-based clustering algorithm DBSCAN (Ester et al., 1996) is able to filter out some impossible vertices to correct errors at character extremities (hair, hands, and feet) and those resulting from clothing that hides extremities such as legs.

APA, Harvard, Vancouver, ISO, and other styles

24

Johnson, Eric. "Density-Based Clustering of High-Dimensional DNA Fingerprints for Library-Dependent Microbial Source Tracking." DigitalCommons@CalPoly, 2015. https://digitalcommons.calpoly.edu/theses/1511.

Full text

Abstract:

As part of an ongoing multidisciplinary effort at California Polytechnic State University, biologists and computer scientists have developed a new Library-Dependent Microbial Source Tracking method for identifying the host animals causing fecal contamination in local water sources. The Cal Poly Library of Pyroprints (CPLOP) is a database which stores E. coli representations of fecal samples from known hosts acquired from a novel method developed by the biologists called Pyroprinting. The research group considers E. coli samples whose Pyroprints match above a certain threshold to be part of the same bacterial strain. If an environmental sample from an unknown host animal matches one of the strains in CPLOP, then it is likely that the host of the unknown sample is the same species as one of the hosts that the strain was previously found in. The computer science technique for finding groups of related data (ie. strains) in a data set is called clustering. In this thesis, we evaluate the use of density-based clustering for identifying strains in CPLOP. Density-based clustering finds clusters of points which have a minimum number of other points within a given radius. We contribute a clustering algorithm based on the original DBSCAN algorithm which removes points from the search space after they have been seen once. We also present a new method for comparing Pyroprints which is algebraically related to the current method. The method has mathematical properties which make it possible to use Pyroprints in a spatial index we designed especially for Pyroprints, which can be utilized by the DBSCAN algorithm to speed up clustering.

APA, Harvard, Vancouver, ISO, and other styles

25

Maier, Joshua. "PERFORMANCE STUDY OF SOW-AND-GROW: A NEW CLUSTERING ALGORITHM FOR BIG DATA." OpenSIUC, 2020. https://opensiuc.lib.siu.edu/theses/2669.

Full text

Abstract:

DBSCAN is a density-based clustering algorithm that is known for being able to cluster irregular shaped clusters and can handle noise points as well. For very large sets of data, however, this algorithm becomes inefficient because it must go through each and every point and look at its neighborhood in order to determine the clusters. Also, DBSCAN is hard to implement in parallel due to the structure of the data and its sequential data access. The Sow and Grow algorithm is a parallel, density-based clustering algorithm. It utilizes a concept of growing points in order to more efficiently find clusters as opposed to going through every point in the dataset in a sequential order. We create an initial seed set of variable size based on user input and a dynamic growing points vector to cluster the data. Our algorithm is designed for shared memory and can be run in parallel using threads. For our experiments, multiple datasets were used with a varying number of points and dimensions. We used this dataset to show the significant speedup the Sow-and-Grow algorithm produces as compared to other parallel, density-based clustering algorithms. On some datasets, Sow-and-Grow achieves a speedup of 8 times faster than another density-based algorithm. We also looked at how changing the number of seeds affects the results in terms of runtime and clusters discovered.

APA, Harvard, Vancouver, ISO, and other styles

26

Courjault-Rade, Vincent. "Ballstering : un algorithme de clustering dédié à de grands échantillons." Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30126/document.

Full text

Abstract:

Ballstering appartient à la famille des méthodes de machine learning qui ont pour but de regrouper en classes les éléments formant la base de données étudiée et ce sans connaissance au préalable des classes qu'elle contient. Ce type de méthodes, dont le représentant le plus connu est k-means, se rassemblent sous le terme de "partitionnement de données" ou "clustering". Récemment un algorithme de partitionnement "Fast Density Peak Clustering" (FDPC) paru dans le journal Science a suscité un intérêt certain au sein de la communauté scientifique pour son aspect innovant et son efficacité sur des données distribuées en groupes non-concentriques. Seulement cet algorithme présente une complexité telle qu'il ne peut être aisément appliqué à des données volumineuses. De plus nous avons pu identifier plusieurs faiblesses pouvant nuire très fortement à la qualité de ses résultats, dont en particulier la présence d'un paramètre général dc difficile à choisir et ayant malheureusement un impact non-négligeable. Compte tenu de ces limites, nous avons repris l'idée principale de FDPC sous un nouvel angle puis apporté successivement des modifications en vue d'améliorer ses points faibles. Modifications sur modifications ont finalement donné naissance à un algorithme bien distinct que nous avons nommé Ballstering. Le fruit de ces 3 années de thèse se résume principalement en la conception de ce dernier, un algorithme de partitionnement dérivé de FDPC spécialement conçu pour être efficient sur de grands volumes de données. Tout comme son précurseur, Ballstering fonctionne en deux phases: une phase d'estimation de densité suivie d'une phase de partitionnement. Son élaboration est principalement fondée sur la construction d'une sous-procédure permettant d'effectuer la première phase de FDPC avec une complexité nettement amoindrie tout évitant le choix de dc qui devient dynamique, déterminé suivant la densité locale. Nous appelons ICMDW cette sous-procédure qui représente une partie conséquente de nos contributions. Nous avons également remanié certaines des définitions au cœur de FDPC et revu entièrement la phase 2 en s'appuyant sur la structure arborescente des résultats fournis par ICDMW pour finalement produire un algorithme outrepassant toutes les limitations que nous avons identifié chez FDPC
Ballstering belongs to the machine learning methods that aim to group in classes a set of objects that form the studied dataset, without any knowledge of true classes within it. This type of methods, of which k-means is one of the most famous representative, are named clustering methods. Recently, a new clustering algorithm "Fast Density Peak Clustering" (FDPC) has aroused great interest from the scientific community for its innovating aspect and its efficiency on non-concentric distributions. However this algorithm showed a such complexity that it can't be applied with ease on large datasets. Moreover, we have identified several weaknesses that impact the quality results and the presence of a general parameter dc difficult to choose while having a significant impact on the results. In view of those limitations, we reworked the principal idea of FDPC in a new light and modified it successively to finally create a distinct algorithm that we called Ballstering. The work carried out during those three years can be summarised by the conception of this clustering algorithm especially designed to be effective on large datasets. As its Precursor, Ballstering works in two phases: An estimation density phase followed by a clustering step. Its conception is mainly based on a procedure that handle the first step with a lower complexity while avoiding at the same time the difficult choice of dc, which becomes automatically defined according to local density. We name ICMDW this procedure which represent a consistent part of our contributions. We also overhauled cores definitions of FDPC and entirely reworked the second phase (relying on the graph structure of ICMDW's intermediate results), to finally produce an algorithm that overcome all the limitations that we have identified

APA, Harvard, Vancouver, ISO, and other styles

27

CASSIANO, KEILA MARA. "TIME SERIES ANALYSIS USING SINGULAR SPECTRUM ANALYSIS (SSA) AND BASED DENSITY CLUSTERING OF THE COMPONENTS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24787@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
Esta tese propõe a utilização do DBSCAN (Density Based Spatial Clustering of Applications with Noise) para separar os componentes de ruído na fase de agrupamento das autotriplas da Análise Singular Espectral (SSA) de Séries Temporais. O DBSCAN é um método moderno de clusterização (revisto em 2013) e especialista em identificar ruído através de regiões de menor densidade. O método de agrupamento hierárquico até então é a última inovação na separação de ruído na abordagem SSA, implementado no pacote R- SSA. No entanto, o método de agrupamento hierárquico é muito sensível a ruído, não é capaz de separá-lo corretamente, não deve ser usado em conjuntos com diferentes densidades e não funciona bem no agrupamento de séries temporais de diferentes tendências, ao contrário dos métodos de aglomeração à base de densidade que são eficazes para separar o ruído a partir dos dados e dedicados para trabalhar bem em dados a partir de diferentes densidades. Este trabalho mostra uma melhor eficiência de DBSCAN sobre os outros métodos já utilizados nesta etapa do SSA, garantindo considerável redução de ruídos e proporcionando melhores previsões. O resultado é apoiado por avaliações experimentais realizadas para séries simuladas de modelos estacionários e não estacionários. A combinação de metodologias proposta também foi aplicada com sucesso na previsão de uma série real de velocidade do vento.
This thesis proposes using DBSCAN (Density Based Spatial Clustering of Applications with Noise) to separate the noise components of eigentriples in the grouping stage of the Singular Spectrum Analysis (SSA) of Time Series. The DBSCAN is a modern (revised in 2013) and expert method at identify noise through regions of lower density. The hierarchical clustering method was the last innovation in noise separation in SSA approach, implemented on package R-SSA. However, is repeated in the literature that the hierarquical clustering method is very sensitive to noise, is unable to separate it correctly, and should not be used in clusters with varying densities and neither works well in clustering time series of different trends. Unlike, the methods of density based clustering are effective in separating the noise from the data and dedicated to work well on data from different densities This work shows better efficiency of DBSCAN over the others methods already used in this stage of SSA, because it allows considerable reduction of noise and provides better forecasting. The result is supported by experimental evaluations realized for simulated stationary and non-stationary series. The proposed combination of methodologies also was applied successfully to forecasting real series of wind s speed.

APA, Harvard, Vancouver, ISO, and other styles

28

Song, Juhee. "Bootstrapping in a high dimensional but very low sample size problem." Texas A&M University, 2003. http://hdl.handle.net/1969.1/3853.

Full text

Abstract:

High Dimension, Low Sample Size (HDLSS) problems have received much attention recently in many areas of science. Analysis of microarray experiments is one such area. Numerous studies are on-going to investigate the behavior of genes by measuring the abundance of mRNA (messenger RiboNucleic Acid), gene expression. HDLSS data investigated in this dissertation consist of a large number of data sets each of which has only a few observations. We assume a statistical model in which measurements from the same subject have the same expected value and variance. All subjects have the same distribution up to location and scale. Information from all subjects is shared in estimating this common distribution. Our interest is in testing the hypothesis that the mean of measurements from a given subject is 0. Commonly used tests of this hypothesis, the t-test, sign test and traditional bootstrapping, do not necessarily provide reliable results since there are only a few observations for each data set. We motivate a mixture model having C clusters and 3C parameters to overcome the small sample size problem. Standardized data are pooled after assigning each data set to one of the mixture components. To get reasonable initial parameter estimates when density estimation methods are applied, we apply clustering methods including agglomerative and K-means. Bayes Information Criterion (BIC) and a new criterion, WMCV (Weighted Mean of within Cluster Variance estimates), are used to choose an optimal number of clusters. Density estimation methods including a maximum likelihood unimodal density estimator and kernel density estimation are used to estimate the unknown density. Once the density is estimated, a bootstrapping algorithm that selects samples from the estimated density is used to approximate the distribution of test statistics. The t-statistic and an empirical likelihood ratio statistic are used, since their distributions are completely determined by the distribution common to all subject. A method to control the false discovery rate is used to perform simultaneous tests on all small data sets. Simulated data sets and a set of cDNA (complimentary DeoxyriboNucleic Acid) microarray experiment data are analyzed by the proposed methods.

APA, Harvard, Vancouver, ISO, and other styles

29

Juan, Rovira Enric. "Analytic derivation of non-linear dark matter clustering from the filtering of the primordial density field." Doctoral thesis, Universitat de Barcelona, 2016. http://hdl.handle.net/10803/395192.

Full text

Abstract:

In this Thesis, we show how the properties of dark matter halos can be directly derived from the proper filtering of the primordial density field through the use of the CUSP formalism (ConflUent System of Peak trajectories). Although the CUSP formalism was first proposed in 1995, it was still incomplete. In the present Thesis we give a general overview of the formalism, explaining its theoretical grounds and how it can be used to derive the typical properties of relaxed dark matter halos. We prove the existence of a one-to-one correspondence between halos and peaks despite the ellipsoidal collapse of peaks and we also show that halos formed through major mergers and accretion have the same properties, dependent on the properties of the respective progenitor peaks at the largest scale. We also apply the CUSP formalism to study the growth of halos, showing that they grow inside-out. We also derive practical analytical expressions for the mass-concentration-shape NFW and Einasto relations over the whole mass and redshift range. Finally, we have applied the CUSP formalism to study the halo mass and multiplicity functions, and their dependence on the exact mass definition used. We have shown that the FoF(0.2) halo finding algorithm is equivalent to the spherical virial overdensity one, which explains the privileged linking length equal to 0.2; we have shown why the virial radii of halos are close to the top-hat radii described by the spherical collapse model, and why the halo mass function is so close to the Press-Shechter form and, lastly, we have explained why the halo multiplicity function is closely universal in the two equivalent cases mentioned before.
En aquesta tesi demostrem com les propietats dels halos de matèria fosca poden ser derivades directament del camp primordial de densitat si s'usa un filtre adequat. En aquest marc, desenvolupem el formalisme CUSP (ConflUent System of Peak trajectories). Amb aquesta tesi completem aquest tractament analític de la formació no lineal d'estructura en el nostre Univers. En primer lloc, fem un resum del formalisme i les seves bases teòriques, explicant com pot ser usat per trobar les propietats típiques dels halos. Alhora demostrem l'existència d'una correspondència unívoca entre pics de densitat i halos. També es demostra que les propietats dels halos relaxats són les mateixes tant si s'han format per fusions o per acreció pura, ja que aquestes només depenen de les propietats dels pics progenitors de major escala. D'aquesta manera, entenem perquè les propietats típiques dels halos de matèria fosca depenen només de la seva massa i del temps. Un cop establert el formalisme CUSP, aquest ha estat usat, en primer lloc, per estudiar el creixement dels halos de matèria fosca. En particular, hem demostrat que els halos creixen de dins cap a fora, punt crucial en el desenvolupament del formalisme CUSP. També hem estudiat els dominis de validesa de la configuració de tipus NFW i Einasto per als perfils de densitat dels halos, establint alhora unes relacions analítiques per a les relacions massa-concentració-forma. Finalment, també hem aplicat el formalisme CUSP per a estudiar les funcions de massa i multiplicitat, i la dependència d'aquestes amb la definició de massa usada. Hem demostrat que l'algoritme de cerca d'halos FOF (0.2) (molt usat en simulacions numèriques) és equivalent a la definició de sobredensitat virial. També hem demostrat el motiu pel qual els radis virials dels halos són similars als top-hat del col·lapse esfèric i perquè la funció de masses dels halos és tan similar a la de Press-Shechter. Finalment, hem explicat el motiu pel qual la funció de multiplicitat dels halos és pràcticament universal en els dos casos equivalents descrits anterioment.

APA, Harvard, Vancouver, ISO, and other styles

30

Wang, Xing. "Time Dependent Kernel Density Estimation: A New Parameter Estimation Algorithm, Applications in Time Series Classiﬁcation and Clustering." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6425.

Full text

Abstract:

The Time Dependent Kernel Density Estimation (TDKDE) developed by Harvey & Oryshchenko (2012) is a kernel density estimation adjusted by the Exponentially Weighted Moving Average (EWMA) weighting scheme. The Maximum Likelihood Estimation (MLE) procedure for estimating the parameters proposed by Harvey & Oryshchenko (2012) is easy to apply but has two inherent problems. In this study, we evaluate the performances of the probability density estimation in terms of the uniformity of Probability Integral Transforms (PITs) on various kernel functions combined with diﬀerent preset numbers. Furthermore, we develop a new estimation algorithm which can be conducted using Artiﬁcial Neural Networks to eliminate the inherent problems with the MLE method and to improve the estimation performance as well. Based on the new estimation algorithm, we develop the TDKDE-based Random Forests time series classiﬁcation algorithm which is signiﬁcantly superior to the commonly used statistical feature-based Random Forests method as well as the Ker- nel Density Estimation (KDE)-based Random Forests approach. Furthermore, the proposed TDKDE-based Self-organizing Map (SOM) clustering algorithm is demonstrated to be superior to the widely used Discrete-Wavelet- Transform (DWT)-based SOM method in terms of the Adjusted Rand Index (ARI).

APA, Harvard, Vancouver, ISO, and other styles

31

Romild, Ulla. "Essays on Distance Based (Non-Euclidean) Tests for Spatial Clustering in Inhomogeneous Populations : Adjusting for the Inhomogeneity through the Distance Used." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-6829.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Sebbar, Mehdi. "On unsupervised learning in high dimension." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLG003/document.

Full text

Abstract:

Dans ce mémoire de thèse, nous abordons deux thèmes, le clustering en haute dimension d'une part et l'estimation de densités de mélange d'autre part. Le premier chapitre est une introduction au clustering. Nous y présentons différentes méthodes répandues et nous nous concentrons sur un des principaux modèles de notre travail qui est le mélange de Gaussiennes. Nous abordons aussi les problèmes inhérents à l'estimation en haute dimension et la difficulté d'estimer le nombre de clusters. Nous exposons brièvement ici les notions abordées dans ce manuscrit. Considérons une loi mélange de K Gaussiennes dans R^p. Une des approches courantes pour estimer les paramètres du mélange est d'utiliser l'estimateur du maximum de vraisemblance. Ce problème n'étant pas convexe, on ne peut garantir la convergence des méthodes classiques. Cependant, en exploitant la biconvexité de la log-vraisemblance négative, on peut utiliser la procédure itérative 'Expectation-Maximization' (EM). Malheureusement, cette méthode n'est pas bien adaptée pour relever les défis posés par la grande dimension. Par ailleurs, cette méthode requiert de connaître le nombre de clusters. Le Chapitre 2 présente trois méthodes que nous avons développées pour tenter de résoudre les problèmes décrits précédemment. Les travaux qui y sont exposés n'ont pas fait l'objet de recherches approfondies pour diverses raisons. La première méthode, 'lasso graphique sur des mélanges de Gaussiennes', consiste à estimer les matrices inverses des matrices de covariance dans l'hypothèse où celles-ci sont parcimonieuses. Nous adaptons la méthode du lasso graphique de [Friedman et al., 2007] sur une composante dans le cas d'un mélange et nous évaluons expérimentalement cette méthode. Les deux autres méthodes abordent le problème d'estimation du nombre de clusters dans le mélange. La première est une estimation pénalisée de la matrice des probabilités postérieures dont la composante (i,j) est la probabilité que la i-ème observation soit dans le j-ème cluster. Malheureusement, cette méthode s'est avérée trop coûteuse en complexité. Enfin, la deuxième méthode considérée consiste à pénaliser le vecteur de poids afin de le rendre parcimonieux. Cette méthode montre des résultats prometteurs. Dans le Chapitre 3, nous étudions l'estimateur du maximum de vraisemblance d'une densité de n observations i.i.d. sous l’hypothèse qu'elle est bien approximée par un mélange de plusieurs densités données. Nous nous intéressons aux performances de l'estimateur par rapport à la perte de Kullback-Leibler. Nous établissons des bornes de risque sous la forme d'inégalités d'oracle exactes, que ce soit en probabilité ou en espérance. Nous démontrons à travers ces bornes que, dans le cas du problème d’agrégation convexe, l'estimateur du maximum de vraisemblance atteint la vitesse (log K)/n)^{1/2}, qui est optimale à un terme logarithmique près, lorsque le nombre de composant est plus grand que n^{1/2}. Plus important, sous l’hypothèse supplémentaire que la matrice de Gram des composantes du dictionnaire satisfait la condition de compatibilité, les inégalités d'oracles obtenues donnent la vitesse optimale dans le scénario parcimonieux. En d'autres termes, si le vecteur de poids est (presque) D-parcimonieux, nous obtenons une vitesse (Dlog K)/n. En complément de ces inégalités d'oracle, nous introduisons la notion d’agrégation (presque)-D-parcimonieuse et établissons pour ce type d’agrégation les bornes inférieures correspondantes. Enfin, dans le Chapitre 4, nous proposons un algorithme qui réalise l'agrégation en Kullback-Leibler de composantes d'un dictionnaire telle qu'étudiée dans le Chapitre 3. Nous comparons sa performance avec différentes méthodes. Nous proposons ensuite une méthode pour construire le dictionnaire de densités et l’étudions de manière numérique. Cette thèse a été effectué dans le cadre d’une convention CIFRE avec l’entreprise ARTEFACT
In this thesis, we discuss two topics, high-dimensional clustering on the one hand and estimation of mixing densities on the other. The first chapter is an introduction to clustering. We present various popular methods and we focus on one of the main models of our work which is the mixture of Gaussians. We also discuss the problems with high-dimensional estimation (Section 1.3) and the difficulty of estimating the number of clusters (Section 1.1.4). In what follows, we present briefly the concepts discussed in this manuscript. Consider a mixture of $K$ Gaussians in $RR^p$. One of the common approaches to estimate the parameters is to use the maximum likelihood estimator. Since this problem is not convex, we can not guarantee the convergence of classical methods such as gradient descent or Newton's algorithm. However, by exploiting the biconvexity of the negative log-likelihood, the iterative 'Expectation-Maximization' (EM) procedure described in Section 1.2.1 can be used. Unfortunately, this method is not well suited to meet the challenges posed by the high dimension. In addition, it is necessary to know the number of clusters in order to use it. Chapter 2 presents three methods that we have developed to try to solve the problems described above. The works presented there have not been thoroughly researched for various reasons. The first method that could be called 'graphical lasso on Gaussian mixtures' consists in estimating the inverse matrices of covariance matrices $Sigma$ (Section 2.1) in the hypothesis that they are parsimonious. We adapt the graphic lasso method of [Friedman et al., 2007] to a component in the case of a mixture and experimentally evaluate this method. The other two methods address the problem of estimating the number of clusters in the mixture. The first is a penalized estimate of the matrix of posterior probabilities $ Tau in RR ^ {n times K} $ whose component $ (i, j) $ is the probability that the $i$-th observation is in the $j$-th cluster. Unfortunately, this method proved to be too expensive in complexity (Section 2.2.1). Finally, the second method considered is to penalize the weight vector $ pi $ in order to make it parsimonious. This method shows promising results (Section 2.2.2). In Chapter 3, we study the maximum likelihood estimator of density of $n$ i.i.d observations, under the assumption that it is well approximated by a mixture with a large number of components. The main focus is on statistical properties with respect to the Kullback-Leibler loss. We establish risk bounds taking the form of sharp oracle inequalities both in deviation and in expectation. A simple consequence of these bounds is that the maximum likelihood estimator attains the optimal rate $((log K)/n)^{1/2}$, up to a possible logarithmic correction, in the problem of convex aggregation when the number $K$ of components is larger than $n^{1/2}$. More importantly, under the additional assumption that the Gram matrix of the components satisfies the compatibility condition, the obtained oracle inequalities yield the optimal rate in the sparsity scenario. That is, if the weight vector is (nearly) $D$-sparse, we get the rate $(Dlog K)/n$. As a natural complement to our oracle inequalities, we introduce the notion of nearly-$D$-sparse aggregation and establish matching lower bounds for this type of aggregation. Finally, in Chapter 4, we propose an algorithm that performs the Kullback-Leibler aggregation of components of a dictionary as discussed in Chapter 3. We compare its performance with different methods: the kernel density estimator , the 'Adaptive Danzig' estimator, the SPADES and EM estimator with the BIC criterion. We then propose a method to build the dictionary of densities and study it numerically. This thesis was carried out within the framework of a CIFRE agreement with the company ARTEFACT

APA, Harvard, Vancouver, ISO, and other styles

33

Gopalaswamy, Sundeep Lim Alvin S. "Dynamic clustering protocol based on relative speed in mobile ad hoc networks for intelligent vehicles." Auburn, Ala., 2007. http://repo.lib.auburn.edu/2007%20Fall%20Theses/GOPALASWAMY_SUNDEEP_4.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Yenket, Renoo. "Understanding methods for internal and external preference mapping and clustering in sensory analysis." Diss., Kansas State University, 2011. http://hdl.handle.net/2097/8770.

Full text

Abstract:

Doctor of Philosophy
Department of Human Nutrition
Edgar Chambers IV
Preference mapping is a method that provides product development directions for developers to see a whole picture of products, liking and relevant descriptors in a target market. Many statistical methods and commercial statistical software programs offering preference mapping analyses are available to researchers. Because of numerous available options, there are two questions addressed in this research that most scientists must answer before choosing a method of analysis: 1) are the different methods providing the same interpretation, co-ordinate values and object orientation; and 2) which method and program should be used with the data provided? This research used data from paint, milk and fragrance studies, representing complexity from lesser to higher. The techniques used are principal component analysis, multidimensional preference map (MDPREF), modified preference map (PREFMAP), canonical variate analysis, generalized procrustes analysis and partial least square regression utilizing statistical software programs of SAS, Unscrambler, Senstools and XLSTAT. Moreover, the homogeneousness of consumer data were investigated through hierarchical cluster analysis (McQuitty’s similarity analysis, median, single linkage, complete linkage, average linkage, and Ward’s method), partitional algorithm (k-means method), nonparametric method versus four manual clustering groups (strict, strict-liking-only, loose, loose-liking-only segments). The manual clusters were extracted according to the most frequently rated highest for best liked and least liked products on hedonic ratings. Furthermore, impacts of plotting preference maps for individual clusters were explored with and without the use of an overall mean liking vector. Results illustrated various statistical software programs were not similar in their oriented and co-ordinate values, even when using the same preference method. Also, if data were not highly homogenous, interpretation could be different. Most computer cluster analyses did not segment consumers relevant to their preferences and did not yield as homogenous clusters as manual clustering. The interpretation of preference maps created by the highest homogeneous clusters had little improvement when applied to complicated data. Researchers should look at key findings from univariate data in descriptive sensory studies to obtain accurate interpretations and suggestions from the maps, especially for external preference mapping. When researchers make recommendations based on an external map alone for complicated data, preference maps may be overused.

APA, Harvard, Vancouver, ISO, and other styles

35

Ogden, Mitchell S. "Observing Clusters and Point Densities in Johnson City, TN Crime Using Nearest Neighbor Hierarchical Clustering and Kernel Density Estimation." Digital Commons @ East Tennessee State University, 2019. https://dc.etsu.edu/asrf/2019/schedule/138.

Full text

Abstract:

Utilizing statistical methods as a risk assessment tool can lead to potentially effective solutions and policies that address various social issues. One usage for such methods is in observation of crime trends within a municipality. Cluster and hotspot analysis is often practiced in criminal statistics to delineate potential areas at-risk of recurring criminal activity. Two approaches to this analytical method are Nearest Neighbor Hierarchical Clustering (NNHC) and Kernel Density Estimation (KDE). Kernel Density Estimation fits incidence points on a grid based on a kernel and bandwidth determined by the analyst. Nearest Neighbor Hierarchical Clustering, a less common and less quantitative method, derives clusters based on the distance between observed points and the expected distance for points of a random distribution. Crime data originated from a public web map and database service that acquires data from the Johnson City Police Department, where each incident is organized into one of many broad categories such as assault, theft, etc. Preliminary analysis of raw volume data shows trends of high crime volume in expected locales; highly trafficked areas such as downtown, the Mall, both Walmarts, as well as low-income residential areas of town. The two methods, KDE and NNHC, dispute the size and location of many clusters. A more in-depth analysis of normalized data with refined parameters may provide further insight on crime in Johnson City.

APA, Harvard, Vancouver, ISO, and other styles

36

Inkaya, Tulin. "A Methodology Of Swarm Intelligence Application In Clustering Based On Neighborhood Construction." Phd thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613232/index.pdf.

Full text

Abstract:

In this dissertation, we consider the clustering problem in data sets with unknown number of clusters having arbitrary shapes, intracluster and intercluster density variations. We introduce a clustering methodology which is composed of three methods that ensures extraction of local density and connectivity properties, data set reduction, and clustering. The first method constructs a unique neighborhood for each data point using the connectivity and density relations among the points based upon the graph theoretical concepts, mainly Gabriel Graphs. Neighborhoods subsequently connected form subclusters (closures) which constitute the skeleton of the clusters. In the second method, the external shape concept in computational geometry is adapted for data set reduction and cluster visualization. This method extracts the external shape of a non-convex n-dimensional data set using Delaunay triangulation. In the third method, we inquire the applicability of Swarm Intelligence to clustering using Ant Colony Optimization (ACO). Ants explore the data set so that the clusters are detected using density break-offs, connectivity and distance information. The proposed ACO-based algorithm uses the outputs of the neighborhood construction (NC) and the external shape formation. In addition, we propose a three-phase clustering algorithm that consists of NC, outlier detection and merging phases. We test the strengths and the weaknesses of the proposed approaches by extensive experimentation with data sets borrowed from literature and generated in a controlled manner. NC is found to be effective for arbitrary shaped clusters, intracluster and intercluster density variations. The external shape formation algorithm achieves significant reductions for convex clusters. The ACO-based and the three-phase clustering algorithms have promising results for the data sets having well-separated clusters.

APA, Harvard, Vancouver, ISO, and other styles

37

Jones, Jesse Jack. "Effects of Non-homogeneous Population Distribution on Smoothed Maps Produced Using Kernel Density Estimation Methods." Thesis, University of North Texas, 2014. https://digital.library.unt.edu/ark:/67531/metadc699888/.

Full text

Abstract:

Understanding spatial perspectives on the spread and incidence of a disease is invaluable for public health planning and intervention. Choropleth maps are commonly used to provide an abstraction of disease risk across geographic space. These maps are derived from aggregated population counts that are known to be affected by the small numbers problem. Kernel density estimation methods account for this problem by producing risk estimates that are based on aggregations of approximately equal population sizes. However, the process of aggregation often combines data from areas with non-uniform spatial and population characteristics. This thesis presents a new method to aggregate space in ways that are sensitive to their underlying risk factors. Such maps will enable better public health practice and intervention by enhancing our ability to understand the spatial processes that result in disparate health outcomes.

APA, Harvard, Vancouver, ISO, and other styles

38

Oesterling, Patrick. "Visual Analysis of High-Dimensional Point Clouds using Topological Abstraction." Doctoral thesis, Universitätsbibliothek Leipzig, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-203056.

Full text

Abstract:

This thesis is about visualizing a kind of data that is trivial to process by computers but difficult to imagine by humans because nature does not allow for intuition with this type of information: high-dimensional data. Such data often result from representing observations of objects under various aspects or with different properties. In many applications, a typical, laborious task is to find related objects or to group those that are similar to each other. One classic solution for this task is to imagine the data as vectors in a Euclidean space with object variables as dimensions. Utilizing Euclidean distance as a measure of similarity, objects with similar properties and values accumulate to groups, so-called clusters, that are exposed by cluster analysis on the high-dimensional point cloud. Because similar vectors can be thought of as objects that are alike in terms of their attributes, the point cloud\'s structure and individual cluster properties, like their size or compactness, summarize data categories and their relative importance. The contribution of this thesis is a novel analysis approach for visual exploration of high-dimensional point clouds without suffering from structural occlusion. The work is based on implementing two key concepts: The first idea is to discard those geometric properties that cannot be preserved and, thus, lead to the typical artifacts. Topological concepts are used instead to shift away the focus from a point-centered view on the data to a more structure-centered perspective. The advantage is that topology-driven clustering information can be extracted in the data\'s original domain and be preserved without loss in low dimensions. The second idea is to split the analysis into a topology-based global overview and a subsequent geometric local refinement. The occlusion-free overview enables the analyst to identify features and to link them to other visualizations that permit analysis of those properties not captured by the topological abstraction, e.g. cluster shape or value distributions in particular dimensions or subspaces. The advantage of separating structure from data point analysis is that restricting local analysis only to data subsets significantly reduces artifacts and the visual complexity of standard techniques. That is, the additional topological layer enables the analyst to identify structure that was hidden before and to focus on particular features by suppressing irrelevant points during local feature analysis. This thesis addresses the topology-based visual analysis of high-dimensional point clouds for both the time-invariant and the time-varying case. Time-invariant means that the points do not change in their number or positions. That is, the analyst explores the clustering of a fixed and constant set of points. The extension to the time-varying case implies the analysis of a varying clustering, where clusters appear as new, merge or split, or vanish. Especially for high-dimensional data, both tracking---which means to relate features over time---but also visualizing changing structure are difficult problems to solve.

APA, Harvard, Vancouver, ISO, and other styles

39

Secchi, Alessandro. "Heterogeneous Effects of Monetary Policy." Doctoral thesis, Universitat Pompeu Fabra, 2005. http://hdl.handle.net/10803/7425.

Full text

Abstract:

The main objective of this thesis is to offer empirical evidence in support of the hypothesis that differences in firms' balance sheet structures may generate heterogeneous responses to monetary policy innovations. To this end in the second, introductory, chapter we start providing some evidence in favor of a large degree of heterogeneity in the asset and liability side of the balance sheet structure of manufacturing firms belonging to different European countries and different size classes. This static comparison is complemented with a quantitative assessment of the sensitivity of asset and liability items to business cycle conditions.
In the third chapter we focus on a specific dimension along which the presence of heterogeneities in the balance sheet structure may induce different responses to a monetary policy action. In particular we address the existence of a channel of transmission of monetary policy, the cost-channel, that operates through the effect of interest expenses on the marginal cost of production. Such a channel is based on an active role of net working capital (inventories, plus trade receivables, less trade payables) in the production process and on the fact that variations in interest rate and credit conditions alter firms' short-run ability to produce final output by investing in net working capital. It has been argued that this mechanism may explain the dimension of the real effects of monetary policy, give a rationale for the positive short-run response of prices to rate increases (the "price puzzle") and call for a more gradual monetary policy response to shocks. The analysis is based on a unique panel, that includes about 2,000 Italian manufacturing firms and 14 years of data on individual prices and interest rates paid on several types of debt. We find robust evidence in favor of the presence of a cost-channel of monetary policy transmission, proportional to the amount of working capital held by each firm and with a size large enough to have non-trivial monetary policy implications.
The empirical analysis of chapter three is based on the hypothesis that the type of heterogeneity that produces different firm level responses to an interest rate variation is well defined and measurable. On the contrary, most of the empirical literature that tests for the existence of heterogeneous effects of monetary policy on firms' production or investment choices is based on an ad hoc assumption of the specific firm level characteristic that should distinguish more sensitive from less sensitive firms. A similar degree of arbitrariness is adopted in selecting the number of classes of firms characterized by different responses to monetary policy shocks as well as in the selection of the cutoff points. The objective of chapter four is to apply a recent econometric methodology that building on data predictive density provides a well defined criteria to detect both the "optimal" dimension along which analyze firms' responses to monetary policy innovations and the "optimal" endogenous groups. The empirical analysis is focused on Italian manufacturing firms and, in particular, on the response of inventory investment to monetary policy shocks from 1983 to 1998. The main results are the following. In strike contrast with what is normally assumed in the literature in most of the cases it turns out that the optimal number of classes that is larger than two. Moreover orderings that are based on variables that are normally thought to be equivalent proxies for the size of the firm (i.e. turnover, total assets and level of employment) do not lead neither to the same number of groups nor to similar splitting points. Finally even if endogenous clusters are mainly characterized by different degrees of within group heterogeneity, with groups composed by smaller firms showing the largest dispersion, there also exist important differences in the average effect of monetary policy across groups. In particular the fact that some of the orderings do not show the expected monotonicity between the rank and the average effect appears to be one of the most remarkable aspects.

APA, Harvard, Vancouver, ISO, and other styles

40

Martinico, Bruno. "Applicazione di tecniche di clustering spaziale su dati macrosismici di felt reports per stimare i parametri dei terremoti." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24993/.

Full text

Abstract:

Questa tesi tratta lo sviluppo e l'applicazione di tecniche di clustering a dati macrosismici di risentimento individuali, ovvero dei singoli utenti, raccolti su piattaforme internet e l'uso di dati di intensità per ricavare informazioni su localizzazione e magnitudo dei terremoti. La procedura si articola in 2 step principali: raggruppamento delle singole intensità individuali (IDPs) attraverso metodi di clustering per ricavare intensità macrosismiche (MDPs) simili alle località secondo le definizioni delle scale macrosismiche (MCS, EMS); elaborazione di tali intensità (MDPs) mediante l'applicazione del codice Boxer (Gasperini et al.,1999; 2010). Sono stati analizzati 9 terremoti in totale avvenuti in Italia dal 2016. Sono state utilizzate 5 tecniche per ricavare intensità MDPs dalle intensità IDPs di EMSC e HSIT, piattaforme di raccolta di IDPs web-based. Le tecniche applicate sono state sviluppate in Matlab e sono: grid-based spaziali con celle equiareali quadrate ed esagonali; aree circolari con raggio proporzionale alla densità; tecniche miste (cerchi+celle equiareali); tecnica DBSCAN. Le MDPs costruite da dati EMSC sono state poi elaborate con il codice Boxer e i risultati ottenuti confrontati con i parametri ricavati per le MDPs di HSIT e con i dati strumentali. La tesi è strutturata i 4 capitoli: 1) trattazione generale sulla fisica del fenomeno sismico e descrizione di cause ed effetti. Vengono descritti i parametri sintetici strumentali e gli equivalenti macrosismici. Viene quindi descritto l'algoritmo di Boxer utilizzato. 2) trattazione sui felt report raccolti da Internet come dati macrosismici con proprie caratteristiche e problematiche specifiche. 3) definizione della trasformazione tra IDPs e MDPs e principali tipologie di tecniche di clustering per effettuarla. 4) descrizione dei dataset dei terremoti utilizzati, dell'elaborazione e dell'analisi condotte.

APA, Harvard, Vancouver, ISO, and other styles

41

Ruzgas, Tomas. "Daugiamačio pasiskirstymo tankio neparametrinis įvertinimas naudojant stebėjimų klasterizavimą." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2007. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2007~D_20070314_094140-20878.

Full text

Abstract:

The paper is devoted to statistical nonparametric estimation of multivariate distribution density. The influence of data pre-clustering on the estimation accuracy of multimodal density is analysed by means of the Monte-Carlo method.

APA, Harvard, Vancouver, ISO, and other styles

42

Grierson, Greg Michael Jr. "Analysis of Amur honeysuckle Stem Density as a Function of Spatial Clustering, Horizontal Distance from Streams, Trails, and Elevation in Riparian Forests, Greene County, Ohio." Wright State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=wright1621942350540022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Marević, Petar. "Towards a unified description of quantum liquid and cluster states in atomic nuclei within the relativistic energy density functional framework." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS358/document.

Full text

Abstract:

Dans cette thèse, nous développons un modèle collectif de la structure du noyau préservant les symétries, basé sur la théorie des fonctionnelles de la densité relativistes. Les états de référence à déformation quadrupole/octupole et à symétrie axiale sont générés en résolvant les équations de Hartree-Bogoliubov relativistes. Nous employons la fonctionnelle avec couplage ponctuel covariant DD-PC1 dans le canal particule-trou de l'interaction effective, tandis que la force d'appariement non-relativiste séparable dans l'espace des impulsions est utilisée dans le canal particule-particule. Les corrélations collectives relatives à la restauration des symétries brisées sont prises en compte en projetant les états de référence à la fois sur les bonnes valeurs du moment angulaire, de la parité et du nombre de particules. L'étape suivante consiste à combiner les états à symétries restaurées à l'aide du formalisme de la méthode de la coordonnée génératrice. Ceci nous permet d'obtenir des prédictions spectroscopiques détaillées, incluant les énergies d'excitation, les moments multipolaires électromagnétiques et les taux de transition, ainsi que les facteurs de forme élastique et inélastique. La méthode décrite est globale et peut être employée pour l'étude de la structure de nucléides très divers. Comme première application de ce modèle, nous étudierons la formation de clusters dans les noyaux légers. Le clustering nucléaire peut être considéré comme étant un phénomène de transition entre les phases liquide quantique et solide des noyaux finis. En contraste avec l'image conventionnelle du liquide quantique homogène, la localisation spatiale des particules alpha donne une image du noyau atomique similaire à une molécule. Nous réalisons en particulier une analyse complète de la collectivité quadrupole-octupole et des structures de cluster dans les isotopes du néon. Une attention particulière est accordée au cas de l'isotope ²⁰Ne, dans lequel il semble que les structures de cluster apparaissent dès l'état fondamental. Nous étudions également la structure à basse énergie de l'isotope ¹²C. Nous concentrons notre analyse sur la structure en bandes construite à partir d'états 0⁺ qui manifestent une grande variété de formes, notamment les configurations triangulaires de la bande de Hoyle ainsi que des chaînes linéaires 3-alpha dans des états de plus haute énergie
In this thesis we develop a symmetry-conserving collective model for nuclear structure studies based on the relativistic energy density functional framework. Axially-symmetric quadrupole- and octupole-deformed reference states are generated by solving the relativistic Hartree-Bogoliubov equations. In the particle-hole channel of the effective interaction we employ the covariant point-coupling DD-PC1 functional, while the non-relativistic pairing force separable in momentum space is used in the particle-particle channel. Collective correlations related to restoration of broken symmetries are accounted for by simultaneously projecting reference states on good values of angular momenta, parity, and particle numbers. In the next step, symmetry-restored states are mixed within the generator coordinate method formalism. This enables us to obtain detailed spectroscopic predictions, including excitation energies, electromagnetic multipole moments and transition rates, as well as both the elastic and inelastic form factors. The described framework is global and it can be employed in various nuclear structure studies across the entire nuclide chart. As a first application, we will study formation of clusters in light nuclei. Nuclear clustering is considered to be a transitional phenomenon between quantum-liquid and solid phases in nuclei. In contrast to the conventional homogeneous quantum-liquid picture, spatial localization of alpha-particles gives rise to a molecule-like picture of atomic nuclei. In particular, we carry out a comprehensive analysis of quadrupole-octupole collectivity and cluster structures in neon isotopes. A special attention is paid to the case of self-conjugate ²⁰Ne isotope, where cluster structures are thought to form already in the ground state. Finally, we study the low-lying structure of ¹²C isotope. We focus on the structure of bands built on 0⁺ states that are known to manifest a rich variety of shapes, including the triangular configurations of the Hoyle band and 3-alpha linear chains in higher states

APA, Harvard, Vancouver, ISO, and other styles

44

Schmidt, Eric. "Atomistic modelling of precipitation in Ni-base superalloys." Thesis, University of Cambridge, 2019. https://www.repository.cam.ac.uk/handle/1810/275131.

Full text

Abstract:

The presence of the ordered $\gamma^{\prime}$ phase ($\text{Ni}_{3}\text{Al}$) in Ni-base superalloys is fundamental to the performance of engineering components such as turbine disks and blades which operate at high temperatures and loads. Hence for these alloys it is important to optimize their microstructure and phase composition. This is typically done by varying their chemistry and heat treatment to achieve an appropriate balance between $\gamma^{\prime}$ content and other constituents such as carbides, borides, oxides and topologically close packed phases. In this work we have set out to investigate the onset of $\gamma^{\prime}$ ordering in Ni-Al single crystals and in Ni-Al bicrystals containing coincidence site lattice grain boundaries (GBs) and we do this at high temperatures, which are representative of typical heat treatment schedules including quenching and annealing. For this we use the atomistic simulation methods of molecular dynamics (MD) and density functional theory (DFT). In the first part of this work we develop robust Bayesian classifiers to identify the $\gamma^{\prime}$ phase in large scale simulation boxes at high temperatures around 1500 K. We observe significant \gamma^{\prime} ordering in the simulations in the form of clusters of $\gamma^{\prime}$-like ordered atoms embedded in a $\gamma$ host solid solution and this happens within 100 ns. Single crystals are found to exhibit the expected homogeneous ordering with slight indications of chemical composition change and a positive correlation between the Al concentration and the concentration of $\gamma^{\prime}$ phase. In general, the ordering is found to take place faster in systems with GBs and preferentially adjacent to the GBs. The sole exception to this is the $\Sigma3 \left(111\right)$ tilt GB, which is a coherent twin. An analysis of the ensemble and time lag average displacements of the GBs reveals mostly `anomalous diffusion' behaviour. Increasing the Al content from pure Ni to Ni 20 at.% Al was found to either consistently increase or decrease the mobility of the GB as seen from the changing slope of the time lag displacement average. The movement of the GB can then be characterized as either `super' or `sub-diffusive' and is interpreted in terms of diffusion induced grain boundary migration, which is posited as a possible precursor to the appearance of serrated edge grain boundaries. In the second part of this work we develop a method for the training of empirical interatomic potentials to capture more elements in the alloy system. We focus on the embedded atom method (EAM) and use the Ni-Al system as a test case. Recently, empirical potentials have been developed based on results from DFT which utilize energies and forces, but neglect the electron densities, which are also available. Noting the importance of electron densities, we propose a route to include them into the training of EAM-type potentials via Bayesian linear regression. Electron density models obtained for structures with a range of bonding types are shown to accurately reproduce the electron densities from DFT. Also, the resulting empirical potentials accurately reproduce DFT energies and forces of all the phases considered within the Ni-Al system. Properties not included in the training process, such as stacking fault energies, are sometimes not reproduced with the desired accuracy and the reasons for this are discussed. General regression issues, known to the machine learning community, are identified as the main difficulty facing further development of empirical potentials using this approach.

APA, Harvard, Vancouver, ISO, and other styles

45

Svoboda, Tomáš. "Implementace statistické metody KDE+." Master's thesis, Vysoké učení technické v Brně. Ústav soudního inženýrství, 2016. http://www.nusl.cz/ntk/nusl-241303.

Full text

Abstract:

In this master's thesis I presented a new statistical method KDE+ (Kernel Density Estimation plus) that allows detecting clusters of points on the linear data. I created a self-standing application that enables anybody to try the method and apply it on their own data. One possible usage of the method and application is for the detection of critical roads sections with a high concentration of traffic accidents. Development of the application includes analysis of KDE+ statistical method, design of appropriate program structures and the implementation. Optimization were carried out to achieve higher performance after creating the prototype. At the end the software was validated by analysing vehicle collision data from the police database of the Czech Republic.

APA, Harvard, Vancouver, ISO, and other styles

46

Provencher, David. "Imagerie de l'activité cérébrale : structure ou signal?" Thèse, Université de Sherbrooke, 2017. http://hdl.handle.net/11143/10472.

Full text

Abstract:

L’imagerie de l’activité neuronale (AN) permet d’étudier le fonctionnement normal et pathologique du cerveau humain, en plus d’aider au diagnostic et à la planification d’interventions neurochirurgicales. L’électroencéphalographie (EEG) et l’imagerie par résonance magnétique fonctionnelle (IRMf) comptent parmi les modalités d’imagerie fonctionnelle les plus utilisées en recherche et en clinique. Plusieurs éléments de la structure cérébrale peuvent toutefois influencer les signaux mesurés, de sorte qu’ils ne reflètent pas uniquement l’AN. Il importe donc d’en tenir compte pour bien interpréter les résultats, surtout lorsqu’on compare des sujets à l’anatomie cérébrale très différente. En outre, la maturation, le vieillissement et certaines pathologies s’accompagnent de changements structurels du cerveau. Ceci complique l’analyse de données longitudinales et la comparaison d’un groupe cible avec un groupe contrôle. Or, notre compréhension des interactions structure-signal demeure incomplète et très peu d’études en tiennent compte. Mon projet de doctorat a consisté à étudier les impacts de la structure cérébrale sur les signaux d’EEG et d’IRMf ainsi qu’à explorer des pistes de solution pour s’en affranchir. J’ai d’abord étudié l’effet de l’amincissement cortical dû au vieillissement sur la désynchronisation liée à l’événement (« event-related desynchronization » - ERD) en EEG. Les résultats ont mis en lumière une relation linéaire négative entre l’ERD et l’épaisseur corticale, ce qui a permis de corriger les signaux par régression. J’ai ensuite étudié l’impact de la présence de veines sur la réponse BOLD (blood-oxygen-level dependent) mesurée en IRMf suite à une stimulation visuelle. Ces travaux ont démontré que la densité veineuse locale, qui varie fortement d’une région et d’un sujet à l’autre, corrèle positivement avec l’amplitude et le délai de la réponse BOLD. Finalement, j’ai adapté une technique de classification de données visant à améliorer la détection des régions du cortex activées en IRMf. Cette méthode permet d’éviter plusieurs problèmes de l’analyse classique en IRMf, de réduire l’impact de la structure cérébrale sur les résultats obtenus et d’établir des cartes d’activité cérébrale contenant plus d’information. Globalement, ces travaux contribuent à l’amélioration de notre compréhension des interactions structure-signal en EEG et en IRMf, ainsi qu’au développement de méthodes d’analyse réduisant leur impact sur l’interprétation des données en termes d’AN.
Abstract : Imaging neural activity allows studying normal and pathological function of the human brain, while also being a useful tool for diagnosis and neurosurgery planning. Electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) are some of the most commonly used functional imaging modalities, both in research and clinic. Many aspects of cerebral structure can however influence the measured signals, so that they do not only reflect neural activity. Taking them into account is therefore of import to correctly interpret results, especially when comparing subjects displaying large differences in brain anatomy. In addition, maturation, aging as well as some pathologies are associated with changes in brain structure. This acts as a confounding factor when analysing longitudinal data or comparing target and control groups. Yet, our understanding of structure-signal relationships remains incomplete and very few studies take them into account. My Ph.D. project consisted in studying the impacts of cerebral structure on EEG and fMRI signals as well as exploring potential solutions to mitigate them. In that regard, I first studied the effect of age-related cortical thinning on event-related desynchronization (ERD) in EEG. Results allowed identifying a negative linear relationship between ERD and cortical thickness, enabling signal correction using regression. I then investigated how the presence of veins in a region impacts the blood-oxygen-level dependent (BOLD) response measured in fMRI following visual stimulation. This work showed that local venous density, which strongly varies across regions and subjects, correlates positively with the BOLD response amplitude and delay. Finally, I adapted a data clustering technique to improve the detection of activated cortical regions in fMRI. This method allows eschewing many problematic assumptions used in classical fMRI analyses, reducing the impacts of cerebral structure on results and establishing richer brain activity maps. Globally, this work contributes to further our understanding of structure-signal interactions in EEG and fMRI as well as to develop analysis methods that reduce their impact on data interpretation in terms of neural activity.

APA, Harvard, Vancouver, ISO, and other styles

47

Mohamad, Ranim. "Relaxation de la contrainte dans les hétérostructures Al(Ga)InN/GaN pour applications électroniques : modélisation des propriétés physiques et rôle de l'indium dans la dégradation des couches épitaxiales." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMC229/document.

Full text

Abstract:

Pour la fabrication des transistors hyperfréquences de puissance à base de nitrures, l’alliage InAlN est considéré comme une meilleure barrière qu’AlGaN grâce à l’accord de maille pour une composition en indium voisine de 18 %. Ainsi le gaz d'électrons à deux dimensions (2DEG) est-il généré seulement par la polarisation spontanée dans une hétérointerface InAlN/GaN sans contrainte résiduelle pour une fabrication de transistors aux performances optimales. Cependant, durant sa croissance sur GaN, sa qualité cristalline se dégrade avec l’épaisseur et il se forme des défauts V au niveau de l’interface. Afin de déterminer les sources de ce comportement, nous avons mené une étude théorique par dynamique moléculaire et techniques ab initio pour analyser la stabilité et les propriétés des alliages des composés nitrures en nous focalisant particulièrement sur InAlN. L’analyse des diagrammes de phase a permis de montrer que cet alliage présente une large gamme d’instabilité en composition d’indium et un comportement différent d’InGaN sous compression avec une instabilité amplifiée sous forte pression. En déterminant la stabilité énergétique de la lacune d’azote en interaction avec l’indium, nous avons montré que ce défaut ponctuel autour duquel des atomes d’indium tendent à retrouver une longueur de liaison voisine de celle dans InN pouvait être un catalyseur pour la formation de clusters dans cet alliage. Ces clusters d’InN introduisent des niveaux donneurs profonds dans la bande interdite. En ce qui concerne les dislocations traversantes, nos résultats montrent qu’elles auront aussi tendance à capturer des atomes d’indium dans leur cœur pour minimiser leur énergie. Ainsi nous avons pu apporter les bases théoriques qui montrent que la lacune d’azote participe à la dégradation spontanée des couches d’InAlN et que les dislocations traversantes sont amenées à y participer en attirant les atomes d’indium et donc en renforçant la séparation de phase en leur voisinage
For the fabrication of nitride-based power microwave transistors, the InAlN alloy is considered to be a better barrier than AlGaN thanks to the lattice match with GaN for an indium composition around 18%. Thus the two-dimensional electron gas (2DEG) is generated only by the spontaneous polarization at the AlInN/GaN heterointerface for a production of highest performance transistors. However, during its growth on GaN, its crystalline quality deteriorates with the thickness and V-defects are formed at the layer surface. To determine the sources of this behavior, we carried out a theoretical study by molecular dynamics and ab initio techniques to analyze the stability and the properties of alloys of nitride compounds, focusing particularly on InAlN. The analysis of the phase diagrams showed that this alloy has a wide zone of instability versus the indium composition and a different behavior with InGaN with amplified instability under high compressive strain. By determining the energetic stability of the nitrogen vacancy could be catalyst for forming clusters in this alloy. These InN clusters introduce deep donor levels inside the band gap. With regard to treading dislocations, our results show that they will also tend to capture indium atoms in their cores in order to minimize their energy. Thus, we have been able to provide a theoretical basis that show that the nitrogen vacancy participates in the spontaneous degradation of the AlInN layers and that the threading dislocations participate by attracting the indium atoms and thus reinforcing the separation of phase in their vicinity

APA, Harvard, Vancouver, ISO, and other styles

48

Xu, Sanlin, and SanlinXu@yahoo com. "Mobility Metrics for Routing in MANETs." The Australian National University. Faculty of Engineering and Information Technology, 2007. http://thesis.anu.edu.au./public/adt-ANU20070621.212401.

Full text

Abstract:

A Mobile Ad hoc Network (MANET) is a collection of wireless mobile nodes forming a temporary network without the need for base stations or any other preexisting network infrastructure. In a peer-to-peer fashion, mobile nodes can communicate with each other by using wireless multihop communication. Due to its low cost, high flexibility, fast network establishment and self-reconfiguration, ad hoc networking has received much interest during the last ten years. However, without a fixed infrastructure, frequent path changes cause significant numbers of routing packets to discover new paths, leading to increased network congestion and transmission latency over fixed networks. Many on-demand routing protocols have been developed by using various routing mobility metrics to choose the most reliable routes, while dealing with the primary obstacle caused by node mobility. ¶ In the first part, we have developed an analysis framework for mobility metrics in random mobility model. Unlike previous research, where the mobility metrics were mostly studied by simulations, we derive the analytical expressions of mobility metrics, including link persistence, link duration, link availability, link residual time, link change rate and their path equivalents. We also show relationships between the different metrics, where they exist. Such exact expressions constitute precise mathematical relationships between network connectivity and node mobility. ¶ We further validate our analysis framework in Random Walk Mobility model (RWMM). Regarding constant or random variable node velocity, we construct the transition matrix of Markov Chain Model through the analysis of the PDF of node separation after one epoch. In addition, we present intuitive and simple expressions for the link residual time and link duration, for the RWMM, which relate them directly to the ratio between transmission range and node speed. We also illustrate the relationship between link change rate and link duration. Finally, simulation results for all mentioned mobility metrics are reported which match well the proposed analytical framework. ¶ In the second part, we investigate the mobility metric applications on caching strategies and hierarchy routing algorithm. When on-demand routing employed, stale route cache information and frequent new-route discovery in processes in MANETs generate considerable routing delay and overhead. This thesis proposes a practical route caching strategy to minimize routing delay and/or overhead by setting route cache timeout to a mobility metric, the expected path residual time. The strategy is independent of network traffic load and adapts to various non-identical link duration distributions, so it is feasible to implement in a real-time route caching scheme. Calculated results show that the routing delay achieved by the route caching scheme is only marginally more than the theoretically determined minimum. Simulation in NS-2 demonstrates that the end-to-end delay from DSR routing can be remarkably reduced by our caching scheme. By using overhead analysis model, we demonstrate that the minimum routing overhead can be achieved by increasing timeout to around twice the expected path residual time, without significant increase in routing delay. ¶ Apart from route cache, this thesis also addresses link cache strategy which has the potential to utilize route information more efficiently than a route cache scheme. Unlike some previous link cache schemes delete links at some fixed time after they enter the cache, we proposes using either the expected path duration or the link residual time as the link cache timeout. Simulation results in NS-2 show that both of the proposed link caching schemes can improve network performance in the DSR by reducing dropped data packets, latency and routing overhead, with the link residual time scheme out-performing the path duration scheme. ¶ To deal with large-scale MANETs, this thesis presents an adaptive k-hop clustering algorithm (AdpKHop), which selects clusterhead (CH) by our CH selection metrics. The proposed CH selection criteria enable that the chosen CHs are closer to the cluster centroid and more stable than other cluster members with respect to node mobility. By using merging threshold which is based on the CH selection metric, 1-hop clusters can merge to k-hop clusters, where the size of each k-hop cluster adapts to the node mobility of the chosen CH. Moreover, we propose a routing overhead analysis model for k-hop clustering algorithm, which is determined by a range of network parameters, such as link change rate (related to node mobility), node degree and cluster density. Through the overhead analysis, we show that an optimal k-hop cluster density does exist, which is independent of node mobility. Therefore, the corresponding optimal cluster merging threshold can be employed to efficiently organise k-hop clusters to achieve minimum routing overhead, which is highly desirable in large-scale networks. ¶ The work presented in this thesis provides a sound basis for future research on mobility analysis for mobile ad hoc networks, in aspects such as mobility metrics, caching strategies and k-hop clustering routing protocols.

APA, Harvard, Vancouver, ISO, and other styles

49

Hlosta, Martin. "Modul pro shlukovou analýzu systému pro dolování z dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237158.

Full text

Abstract:

This thesis deals with the design and implementation of a cluster analysis module for currently developing datamining system DataMiner on FIT BUT. So far, the system lacked cluster analysis module. The main objective of the thesis was therefore to extend the system of such a module. Together with me, Pavel Riedl worked on the module. We have created a common part for all the algorithms so that the system can be easily extended to other clustering algorithms. In the second part, I extended the clustering module by adding three density based clustering aglorithms - DBSCAN, OPTICS and DENCLUE. Algorithms have been implemented and appropriate sample data was chosen to verify theirs functionality.

APA, Harvard, Vancouver, ISO, and other styles

50

Fansi, Tchango Arsène. "Reconnaissance comportementale et suivi multi-cible dans des environnements partiellement observés." Thesis, Université de Lorraine, 2015. http://www.theses.fr/2015LORR0156/document.

Full text

Abstract:

Dans cette thèse, nous nous intéressons au problème du suivi comportemental des piétons au sein d'un environnement critique partiellement observé. Tandis que plusieurs travaux de la littérature s'intéressent uniquement soit à la position d'un piéton dans l'environnement, soit à l'activité à laquelle il s'adonne, nous optons pour une vue générale et nous estimons simultanément à ces deux données. Les contributions présentées dans ce document sont organisées en deux parties. La première partie traite principalement du problème de la représentation et de l'exploitation du contexte environnemental dans le but d'améliorer les estimations résultant du processus de suivi. L'état de l'art fait mention de quelques études adressant cette problématique. Dans ces études, des modèles graphiques aux capacités d'expressivité limitées, tels que des réseaux Bayésiens dynamiques, sont utilisés pour modéliser des connaissances contextuelles a priori. Dans cette thèse, nous proposons d'utiliser des modèles contextuelles plus riches issus des simulateurs de comportements d'agents autonomes et démontrons l’efficacité de notre approche au travers d'un ensemble d'évaluations expérimentales. La deuxième partie de la thèse adresse le problème général d'influences mutuelles - communément appelées interactions - entre piétons et l'impact de ces interactions sur les comportements respectifs de ces derniers durant le processus de suivi. Sous l'hypothèse que nous disposons d'un simulateur (ou une fonction) modélisant ces interactions, nous développons une approche de suivi comportemental à faible coût computationnel et facilement extensible dans laquelle les interactions entre cibles sont prises en compte. L'originalité de l'approche proposée vient de l'introduction des "représentants'', qui sont des informations agrégées issues de la distribution de chaque cible de telle sorte à maintenir une diversité comportementale, et sur lesquels le système de filtrage s'appuie pour estimer, de manière fine, les comportements des différentes cibles et ceci, même en cas d'occlusions. Nous présentons nos choix de modélisation, les algorithmes résultants, et un ensemble de scénarios difficiles sur lesquels l’approche proposée est évaluée
In this thesis, we are interested in the problem of pedestrian behavioral tracking within a critical environment partially under sensory coverage. While most of the works found in the literature usually focus only on either the location of a pedestrian or the activity a pedestrian is undertaking, we stands in a general view and consider estimating both data simultaneously. The contributions presented in this document are organized in two parts. The first part focuses on the representation and the exploitation of the environmental context for serving the purpose of behavioral estimation. The state of the art shows few studies addressing this issue where graphical models with limited expressiveness capacity such as dynamic Bayesian networks are used for modeling prior environmental knowledge. We propose, instead, to rely on richer contextual models issued from autonomous agent-based behavioral simulators and we demonstrate the effectiveness of our approach through extensive experimental evaluations. The second part of the thesis addresses the general problem of pedestrians’ mutual influences, commonly known as targets’ interactions, on their respective behaviors during the tracking process. Under the assumption of the availability of a generic simulator (or a function) modeling the tracked targets' behaviors, we develop a yet scalable approach in which interactions are considered at low computational cost. The originality of the proposed approach resides on the introduction of density-based aggregated information, called "representatives’’, computed in such a way to guarantee the behavioral diversity for each target, and on which the filtering system relies for computing, in a finer way, behavioral estimations even in case of occlusions. We present the modeling choices, the resulting algorithms as well as a set of challenging scenarios on which the proposed approach is evaluated

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!