Siga este enlace para ver otros tipos de publicaciones sobre el tema: Clustering spectral.

Tesis sobre el tema "Clustering spectral"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores tesis para su investigación sobre el tema "Clustering spectral".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Shortreed, Susan. "Learning in spectral clustering /". Thesis, Connect to this title online; UW restricted, 2006. http://hdl.handle.net/1773/8977.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Larson, Ellis y Nelly Åkerblom. "Spectral clustering for Meteorology". Thesis, KTH, Skolan för teknikvetenskap (SCI), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297760.

Texto completo
Resumen
Climate is a tremendously complex topic, affecting many aspects of human activity and constantly changing. Defining some structures and rules for how it works is thereof of the utmost importance even though it might only cover a small part of the complexity. Cluster analysis is a tool developed in data analysis that is able to categorize data into groups of similar type. In this paper data from the Swedish Meteorological and Hydrological Institute (SMHI) is clustered to find a partitioning. The cluster analysis used is called Spectral clustering which is a family of methods making use of the spectral properties of graphs. Concrete results over different groupings of climate over Sweden were found.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Gaertler, Marco. "Clustering with spectral methods". [S.l. : s.n.], 2002. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB10101213.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Masum, Mohammad. "Vertex Weighted Spectral Clustering". Digital Commons @ East Tennessee State University, 2017. https://dc.etsu.edu/etd/3266.

Texto completo
Resumen
Spectral clustering is often used to partition a data set into a specified number of clusters. Both the unweighted and the vertex-weighted approaches use eigenvectors of the Laplacian matrix of a graph. Our focus is on using vertex-weighted methods to refine clustering of observations. An eigenvector corresponding with the second smallest eigenvalue of the Laplacian matrix of a graph is called a Fiedler vector. Coefficients of a Fiedler vector are used to partition vertices of a given graph into two clusters. A vertex of a graph is classified as unassociated if the Fiedler coefficient of the vertex is close to zero compared to the largest Fiedler coefficient of the graph. We propose a vertex-weighted spectral clustering algorithm which incorporates a vector of weights for each vertex of a given graph to form a vertex-weighted graph. The proposed algorithm predicts association of equidistant or nearly equidistant data points from both clusters while the unweighted clustering does not provide association. Finally, we implemented both the unweighted and the vertex-weighted spectral clustering algorithms on several data sets to show that the proposed algorithm works in general.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Larsson, Johan y Isak Ågren. "Numerical Methods for Spectral Clustering". Thesis, KTH, Skolan för teknikvetenskap (SCI), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-275701.

Texto completo
Resumen
The Aviation industry is important to the European economy and development, therefore a study of the sensitivity of the European flight network is interesting. If clusters exist within the network, that could indicate possible vulnerabilities or bottlenecks, since that would represent a group of airports poorly connected to other parts of the network. In this paper a cluster analysis using spectral clustering is performed with flight data from 34 different European countries. The report also looks at how to implement the spectral clustering algorithm for large data sets. After performing the spectral clustering it appears as if the European flight network is not clustered, and thus does not appear to be sensitive.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Rossi, Alfred Vincent III. "Temporal Clustering of Finite Metric Spaces and Spectral k-Clustering". The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1500033042082458.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Darke, Felix y Blomkvist Linus Below. "Categorization of songs using spectral clustering". Thesis, KTH, Skolan för teknikvetenskap (SCI), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297763.

Texto completo
Resumen
A direct consequence of the world becoming more digital is that the amount of available data grows, which presents great opportunities for organizations, researchers and institutions alike.However, this places a huge demand on efficient and understandable algorithms for analyzing vast datasets. This project is centered around using one of these algorithms for identifying groups of songs in a public dataset released by Spotify in 2018. This problem is part of a larger problem class, where one wish to assign data into groups, without the preexisting knowledge of what makes the different groups special, or how many different groups there are. This is typically solved using unsupervised machine learning. The overall goal of this project was to use spectral clustering (a specific algorithm in the unsupervised machine learning family) to assign 50 704 songs from the dataset into different categories, where each category would be made up of similar songs. The algorithm rests upon graph theory, and a large emphasis was placed upon actuallyunderstanding the mathematical foundation and motivation behind the method before the actual implementation, which is reflected in the report. The results achieved through applying spectral clustering were one large group consisting of 40 718 songs in combination with 22 smaller groups, all larger than 100 songs, with an average size of 430 songs. The groups found were not examined in depth, but the analysis done hints that certain groups were clearly different from the data as a whole in terms of the musical features. For instance, one groupwere deemed to be 54% more likely to be acoustic than the dataset as a whole. As a conclusion, the largest cluster was deemed to be an artefact of the fact that when a sample of songs listened to on Spotify is taken, the likelihood of these songs mainly being popular songs would be high. This would explain the homogeneity that resulted in the fact that most songs were assigned into the same group, which also resulted in the limited success of spectral clustering for this specific project.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Marotta, Serena. "Alcuni metodi matriciali per lo Spectral Clustering". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14122/.

Texto completo
Resumen
L'obiettivo di questa tesi è analizzare nel dettaglio un insieme di tecniche di analisi dei dati, volte alla selezione e al raggruppamento di elementi omogenei, in modo che si possano facilmente interfacciare tra di loro e fornire un utilizzo più semplice per chi opera nel settore.È introdotta la trattazione dei principali metodi di clustering: linkage, k-medie e in particolare spectral clustering, argomento centrale della mia tesi.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Alshammari, Mashaan. "Graph Filtering and Automatic Parameter Selection for Efficient Spectral Clustering". Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/24091.

Texto completo
Resumen
Spectral clustering is usually used to detect non-convex clusters. Despite being an effective method to detect this type of clusters, spectral clustering has two deficiencies that made it less attractive for the pattern recognition community. First, the graph Laplacian has to pass through eigen-decomposition to find the embedding space. This has been proved to be a computationally expensive process when the number of points is large. Second, spectral clustering used parameters that highly influence its outcome. Tuning these parameters manually would be a tedious process when examining different datasets. This thesis introduces solutions to these two problems of spectral clustering. For computational efficiency, we proposed approximated graphs with a reduced number of graph vertices. Consequently, eigen-decomposition will be performed on a matrix with reduced size which makes it faster. Unfortunately, reducing graph vertices could lead to a loss in local information that affects clustering accuracy. Thus, we proposed another graph where the number of edges was reduced significantly while keeping the same number of vertices to maintain local information. This would reduce the matrix size, making it computationally efficient and maintaining good clustering accuracy. Regarding influential parameters, we proposed cost functions that test a range of values and decide on the optimum value. Cost functions were used to estimate the number of embedding space dimensions and the number of clusters. We also observed in the literature that the graph reduction step requires manual tuning of parameters. Therefore, we developed a graph reduction framework that does not require any parameters.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Azam, Nadia Farhanaz. "Spectral clustering: An explorative study of proximity measures". Thesis, University of Ottawa (Canada), 2009. http://hdl.handle.net/10393/28238.

Texto completo
Resumen
In cluster analysis, data are clustered into meaningful groups so that the objects in the same group are very similar, and the objects residing in two different groups are different from one another. One such cluster analysis algorithm is called the spectral clustering algorithm, which originated from the area of graph partitioning. The input, in this case, is a similarity matrix, constructed from the pair-wise similarity between data objects. The algorithm uses the eigenvalues and eigenvectors of a normalized similarity matrix to partition the data. The pair-wise similarity between the objects is calculated from the proximity (e.g. similarity or distance) measures. In any clustering task, the proximity measures often play a crucial role. In fact, one of the early and fundamental steps in a clustering process is the selection of a suitable proximity measure. A number of such measures may be used for this task. However, the success of a clustering algorithm partially depends on the selection of the proximity measure. While, the majority of prior research on the spectral clustering algorithm emphasizes on the algorithm-specific issues, little research has been performed on the evaluation of the performance of the proximity measures. To this end, we perform a comparative and exploratory analysis on several existing proximity measures to evaluate their performance when applying the spectral clustering algorithm to a number of diverse data sets. To accomplish this task, we use a ten-fold cross validation technique, and assess the clustering results using several external cluster evaluation measures. The performances of the proximity measures are then compared using the quantitative results from the external evaluation measures and analyzed further to determine the probable causes that may have led to such results. In essence, our experimental evaluation indicates that the proximity measures, in general, yield comparable results. That is, no measure is clearly superior, or inferior, to the others in its group. However, among the six similarity measures considered for the binary data, one measure (Russell and Roo similarity coefficient) frequently performed poorer than the others. For numeric data, our study shows that the distance measures based on the relative distances (i.e. the Pearson correlation coefficient and the Angular distance) generally performed better than the distance measures based on the absolute distances (e.g. the Euclidean or Manhattan distance). When considering the proximity measures for mixed data, our results indicate that the choice of distance measure for the numeric data has the highest impact on the final outcome.
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Kong, Tian Fook. "Multilevel spectral clustering : graph partitions and image segmentation". Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/45275.

Texto completo
Resumen
Thesis (S.M.)--Massachusetts Institute of Technology, Computation for Design and Optimization Program, 2008.
Includes bibliographical references (p. 145-146).
While the spectral graph partitioning method gives high quality segmentation, segmenting large graphs by the spectral method is computationally expensive. Numerous multilevel graph partitioning algorithms are proposed to reduce the segmentation time for the spectral partition of large graphs. However, the greedy local refinement used in these multilevel schemes has the tendency of trapping the partition in poor local minima. In this thesis, I develop a multilevel graph partitioning algorithm that incorporates the inverse powering method with greedy local refinement. The combination of the inverse powering method with greedy local refinement ensures that the partition quality of the multilevel method is as good as, if not better than, segmenting the large graph by the spectral method. In addition, I present a scheme to construct the adjacency matrix, W and degree matrix, D for the coarse graphs. The proposed multilevel graph partitioning algorithm is able to bisect a graph (k = 2) with significantly shorter time than segmenting the original graph without the multilevel implementation, and at the same time achieving the same normalized cut (Ncut) value. The starting eigenvector, obtained by solving a generalized eigenvalue problem on the coarsest graph, is close to the Fiedler vector of the original graph. Hence, the inverse iteration needs only a few iterations to converge the starting vector. In the k-way multilevel graph partition, the larger the graph, the greater the reduction in the time needed for segmenting the graph. For the multilevel image segmentation, the multilevel scheme is able to give better segmentation than segmenting the original image. The multilevel scheme has higher success of preserving the salient part of an object.
(cont.) In this work, I also show that the Ncut value is not the ultimate yardstick for the segmentation quality of an image. Finding a partition that has lower Ncut value does not necessary means better segmentation quality. Segmenting large images by the multilevel method offers both speed and quality.
by Tian Fook Kong.
S.M.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Aven, Matthew. "Daily Traffic Flow Pattern Recognition by Spectral Clustering". Scholarship @ Claremont, 2017. http://scholarship.claremont.edu/cmc_theses/1597.

Texto completo
Resumen
This paper explores the potential applications of existing spectral clustering algorithms to real life problems through experiments on existing road traffic data. The analysis begins with an overview of previous unsupervised machine learning techniques and constructs an effective spectral clustering algorithm that demonstrates the analytical power of the method. The paper focuses on the spectral embedding method’s ability to project non-linearly separable, high dimensional data into a more manageable space that allows for accurate clustering. The key step in this method involves solving a normalized eigenvector problem in order to construct an optimal representation of the original data. While this step greatly enhances our ability to analyze the relationships between data points and identify the natural clusters within the original dataset, it is difficult to comprehend the eigenvalue representation of the data in terms of the original input variables. The later sections of this paper will explore how the careful framing of questions with respect to available data can help researchers extract tangible decision driving results from real world data through spectral clustering analysis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Gan, Sonny. "The application of spectral clustering in drug discovery". Thesis, University of Sheffield, 2013. http://etheses.whiterose.ac.uk/4839/.

Texto completo
Resumen
The application of clustering algorithms to chemical datasets is well established and has been reviewed extensively. Recently, a number of ‘modern’ clustering algorithms have been reported in other fields. One example is spectral clustering, which has yielded promising results in areas such as protein library analysis. The term spectral clustering is used to describe any clustering algorithm that utilises the eigenpairs of a matrix as the basis for partitioning a dataset. This thesis describes the development and optimisation of a non-overlapping spectral clustering method that is based upon a study by Brewer. The initial version of the spectral clustering algorithm was closely related to Brewer’s method and used a full matrix diagonalisation procedure to identify the eigenpairs of an input matrix. This spectral clustering method was compared to the k-means and Ward’s algorithms, producing encouraging results, for example, when coupled with extended connectivity fingerprints, this method outperformed the other clustering algorithms according to the QCI measure. Although the spectral clustering algorithm showed promising results, its operational costs restricted its application to small datasets. Hence, the method was optimised in successive studies. Firstly, the effect of matrix sparsity on the spectral clustering was examined and showed that spectral clustering with sparse input matrices can lead to an improvement in the results. Despite this improvement, the costs of spectral clustering remained prohibitive, so the full matrix diagonalisation procedure was replaced with the Lanczos algorithm that has lower associated costs, as suggested by Brewer. This method led to a significant decrease in the computational costs when identifying a small number of clusters, however a number of issues remained; leading to the adoption of a SVD-based eigendecomposition method. The SVD-based algorithm was shown to be highly efficient, accurate and scalable through a number of studies.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Ghafoory, Jones. "p-Laplacian Spectral Clustering Applied in Software Testing". Thesis, KTH, Numerisk analys, NA, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260255.

Texto completo
Resumen
Software testing plays a vital role in the software development life cycle. Having a more accurate and cost-efficient testing process is still demanded in the industry. Thus, test optimization becomes an important topic in both state of the art and state of the practice. Software testing today can be performed manually, automatically or semi-automatically. A manual test procedure is still popular for testing for instance in safety critical systems. For testing a software product manually, we need to create a set of manual test case specifications. The number of required test cases for testing a product is dependent on the product size, complexity, the company policies, etc. Moreover, generating and executing test cases manually is a time and resource consuming process. Therefore, ranking the test cases for execution can help us reduce the testing cost and also release the product faster to the market. In order to rank test cases for execution, we need to distinguish test cases from each other. In other words, the properties of each test case should be detected in advance. Requirement coverage is detected as a critical criterion for test cases optimization. In this thesis we propose an approach based on a $p$-Laplacian Spectral Clustering for detecting the traceability matrix between manual test cases and the requirements, in order to find the requirement coverage for the test cases. However, the feasibility of the proposed approach is studied by an empirical evaluation which has been performed on a railway use-case at Bombardier Transportation in Sweden. Through the experiments performed using our proposed method it was able to achieve an $F_1$-score up to $4.4\%$. Although the proposed approach under-performed for this specific problem compared to previous studies, it was possible to get some insights on what limitations $p$-Laplacian Spectral Clustering have and how it could potentially be modified for similar kind of problems.
Mjukvarutestning har en viktig roll inom mjukvaruutveckling. Att ha en mer exakt och kostnadseffektiv testprocess är efterfrågad i industrin. Därför är testoptimering ett viktigt ämne inom forskning och i praktiken. Idag kan mjukvarutestning utföras manuellt, automatiskt eller halvautomatiskt. En manuell testprocess är fortfarande populär för att testa säkerhetskritiska system. För att testa en programvara manuellt så måste vi skapa en uppsättning specifikationer för testfall. Antalet testfall som behövs kan bero på bland annat produktens storlek, komplexitet, företagspolicys etc. Att generera och utföra testfall manuellt är ofta en tids- och resurskrävande process. För att minska testkostnader och för att potentiellt sett kunna släppa produkten till marknaden snabbare kan det därför vara av intresse att rangordna vilka test fall som borde utföras. För att göra rangordningen så måste testfallens särskiljas på något vis. Med andra ord så måste varje testfalls egenskaper upptäckas i förväg. En viktig egenskap att urskilja från testfallen är hur många krav testfallet omfattar. I det här projektet tar vi fram en metod baserad på $p$-Laplacian spektralklustring för att hitta en spårbarhetsmatris mellan manuella testfall och krav för att ta reda på vilka krav som omfattas av alla testfall. För att evaluera metodens lämplighet så jämförs den mot en tidigare empirisk studie av samma problem som gjordes på ett järnvägsbruk hos Bombardier Transportation i Sverige. Från de experiment som utfördes med vår framtagna metod så kunde ett $F_1$-Score på $4.4\%$ uppnås. Även om den metod som togs fram i detta projekt underpresterade för det här specifika problemet så kunde insikter om vilka begränsningar $p$-Laplacian spektralklustring har och hur de potentiellt sett kan behandlas för liknande problem.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Casaca, Wallace Correa de Oliveira. "Graph Laplacian for spectral clustering and seeded image segmentation". Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-24062015-112215/.

Texto completo
Resumen
Image segmentation is an essential tool to enhance the ability of computer systems to efficiently perform elementary cognitive tasks such as detection, recognition and tracking. In this thesis we concentrate on the investigation of two fundamental topics in the context of image segmentation: spectral clustering and seeded image segmentation. We introduce two new algorithms for those topics that, in summary, rely on Laplacian-based operators, spectral graph theory, and minimization of energy functionals. The effectiveness of both segmentation algorithms is verified by visually evaluating the resulting partitions against state-of-the-art methods as well as through a variety of quantitative measures typically employed as benchmark by the image segmentation community. Our spectral-based segmentation algorithm combines image decomposition, similarity metrics, and spectral graph theory into a concise and powerful framework. An image decomposition is performed to split the input image into texture and cartoon components. Then, an affinity graph is generated and weights are assigned to the edges of the graph according to a gradient-based inner-product function. From the eigenstructure of the affinity graph, the image is partitioned through the spectral cut of the underlying graph. Moreover, the image partitioning can be improved by changing the graph weights by sketching interactively. Visual and numerical evaluation were conducted against representative spectral-based segmentation techniques using boundary and partition quality measures in the well-known BSDS dataset. Unlike most existing seed-based methods that rely on complex mathematical formulations that typically do not guarantee unique solution for the segmentation problem while still being prone to be trapped in local minima, our segmentation approach is mathematically simple to formulate, easy-to-implement, and it guarantees to produce a unique solution. Moreover, the formulation holds an anisotropic behavior, that is, pixels sharing similar attributes are preserved closer to each other while big discontinuities are naturally imposed on the boundary between image regions, thus ensuring better fitting on object boundaries. We show that the proposed approach significantly outperforms competing techniques both quantitatively as well as qualitatively, using the classical GrabCut dataset from Microsoft as a benchmark. While most of this research concentrates on the particular problem of segmenting an image, we also develop two new techniques to address the problem of image inpainting and photo colorization. Both methods couple the developed segmentation tools with other computer vision approaches in order to operate properly.
Segmentar uma image é visto nos dias de hoje como uma prerrogativa para melhorar a capacidade de sistemas de computador para realizar tarefas complexas de natureza cognitiva tais como detecção de objetos, reconhecimento de padrões e monitoramento de alvos. Esta pesquisa de doutorado visa estudar dois temas de fundamental importância no contexto de segmentação de imagens: clusterização espectral e segmentação interativa de imagens. Foram propostos dois novos algoritmos de segmentação dentro das linhas supracitadas, os quais se baseiam em operadores do Laplaciano, teoria espectral de grafos e na minimização de funcionais de energia. A eficácia de ambos os algoritmos pode ser constatada através de avaliações visuais das segmentações originadas, como também através de medidas quantitativas computadas com base nos resultados obtidos por técnicas do estado-da-arte em segmentação de imagens. Nosso primeiro algoritmo de segmentação, o qual ´e baseado na teoria espectral de grafos, combina técnicas de decomposição de imagens e medidas de similaridade em grafos em uma única e robusta ferramenta computacional. Primeiramente, um método de decomposição de imagens é aplicado para dividir a imagem alvo em duas componentes: textura e cartoon. Em seguida, um grafo de afinidade é gerado e pesos são atribuídos às suas arestas de acordo com uma função escalar proveniente de um operador de produto interno. Com base no grafo de afinidade, a imagem é então subdividida por meio do processo de corte espectral. Além disso, o resultado da segmentação pode ser refinado de forma interativa, mudando-se, desta forma, os pesos do grafo base. Experimentos visuais e numéricos foram conduzidos tomando-se por base métodos representativos do estado-da-arte e a clássica base de dados BSDS a fim de averiguar a eficiência da metodologia proposta. Ao contrário de grande parte dos métodos existentes de segmentação interativa, os quais são modelados por formulações matemáticas complexas que normalmente não garantem solução única para o problema de segmentação, nossa segunda metodologia aqui proposta é matematicamente simples de ser interpretada, fácil de implementar e ainda garante unicidade de solução. Além disso, o método proposto possui um comportamento anisotrópico, ou seja, pixels semelhantes são preservados mais próximos uns dos outros enquanto descontinuidades bruscas são impostas entre regiões da imagem onde as bordas são mais salientes. Como no caso anterior, foram realizadas diversas avaliações qualitativas e quantitativas envolvendo nossa técnica e métodos do estado-da-arte, tomando-se como referência a base de dados GrabCut da Microsoft. Enquanto a maior parte desta pesquisa de doutorado concentra-se no problema específico de segmentar imagens, como conteúdo complementar de pesquisa foram propostas duas novas técnicas para tratar o problema de retoque digital e colorização de imagens.
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Barreira, Daniel y Netterström Nazar Maksymchuk. "Recommend Songs With Data From Spotify Using Spectral Clustering". Thesis, KTH, Skolan för teknikvetenskap (SCI), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297683.

Texto completo
Resumen
Spotify, which is one of the worlds biggest music services, posted a data set and an open-ended challenge for music recommendation research. This study's goal is to recommend songs to playlists with the given data set from Spotify using Spectral clustering. While the given data set had 1 000 000 playlists, Spectral clustering was performed on a subset with 16 000 playlists due to the lack of computational resources. With four different weighting methods describing the connection between playlists, the study shows results of reasonable clusters where similar category of playlists were clustered together although most of the results also had a very large clusters where a lot of different sorts of playlists were clustered together. The conclusion of the results were that the data was overly connected as an effect of our weighting methods. While the results show the possibility of recommending songs to a limited number of playlists, hierarchical clustering would possibly be helpful to be able to recommend song to a larger amount of playlists, but that is left to future research to conclude.
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Pihlström, Ralf. "On some Spectral Properties of Stochastic Similarity Matrices for Data Clustering". Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-396646.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Carozza, Marina. "Matrici Laplaciane sui grafi, proprietà di interlacing ed applicazione allo spectral clustering". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/18789/.

Texto completo
Resumen
L'obiettivo di questo elaborato è di esporre alcuni principali teoremi riguardanti proprietà spettrali di particolari matrici usate per descrivere dati, dette matrici Laplaciane, costruite a partire da grafi, e la loro applicazione nello spectral clustering. In particolare, viene analizzata una proprietà detta "proprietà di interlacing". L'ultimo capitolo sarà dedicato ad esperimenti numerici, volti ad illustrare computazionalmente i risultati teorici.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Karatzoglou, Alexandros y Ingo Feinerer. "Text Clustering with String Kernels in R". Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2006. http://epub.wu.ac.at/1002/1/document.pdf.

Texto completo
Resumen
We present a package which provides a general framework, including tools and algorithms, for text mining in R using the S4 class system. Using this package and the kernlab R package we explore the use of kernel methods for clustering (e.g., kernel k-means and spectral clustering) on a set of text documents, using string kernels. We compare these methods to a more traditional clustering technique like k-means on a bag of word representation of the text and evaluate the viability of kernel-based methods as a text clustering technique. (author's abstract)
Series: Research Report Series / Department of Statistics and Mathematics
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Mayer-Jochimsen, Morgan. "Clustering Methods and Their Applications to Adolescent Healthcare Data". Scholarship @ Claremont, 2013. http://scholarship.claremont.edu/scripps_theses/297.

Texto completo
Resumen
Clustering is a mathematical method of data analysis which identifies trends in data by efficiently separating data into a specified number of clusters so is incredibly useful and widely applicable for questions of interrelatedness of data. Two methods of clustering are considered here. K-means clustering defines clusters in relation to the centroid, or center, of a cluster. Spectral clustering establishes connections between all of the data points to be clustered, then eliminates those connections that link dissimilar points. This is represented as an eigenvector problem where the solution is given by the eigenvectors of the Normalized Graph Laplacian. Spectral clustering establishes groups so that the similarity between points of the same cluster is stronger than similarity between different clusters. K-means and spectral clustering are used to analyze adolescent data from the 2009 California Health Interview Survey. Differences were observed between the results of the clustering methods on 3294 individuals and 22 health-related attributes. K-means clustered the adolescents by exercise, poverty, and variables related to psychological health while spectral clustering groups were informed by smoking, alcohol use, low exercise, psychological distress, low parental involvement, and poverty. We posit some guesses as to this difference, observe characteristics of the clustering methods, and comment on the viability of spectral clustering on healthcare data.
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Cresswell, Kellen Garrison. "Spectral methods for the detection and characterization of Topologically Associated Domains". VCU Scholars Compass, 2019. https://scholarscompass.vcu.edu/etd/6100.

Texto completo
Resumen
The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for identification of hierarchies. Additionally, there are no publicly available tools for comparison of TADs across datasets. These tools are necessary to conduct large-scale genome-wide analysis and comparison of 3D structure. To address the challenge of TAD identification, we developed a novel sliding window-based spectral clustering framework that uses gaps between consecutive eigenvectors for TAD boundary identification. Our method, implemented in an R package, SpectralTAD, has automatic parameter selection, is robust to sequencing depth, resolution and sparsity of Hi-C data, and detects hierarchical, biologically relevant TADs. SpectralTAD outperforms four state-of-the-art TAD callers in simulated and experimental settings. We demonstrate that TAD boundaries shared among multiple levels of the TAD hierarchy were more enriched in classical boundary marks and more conserved across cell lines and tissues. SpectralTAD is available at http://bioconductor.org/packages/SpectralTAD/. To address the problem of TAD comparison, we developed TADCompare. TADCompare is based on a spectral clustering-derived measure called the eigenvector gap, which enables a loci-by-loci comparison of TAD boundary differences between datasets. Using this measure, we introduce methods for identifying differential and consensus TAD boundaries and tracking TAD boundary changes over time. We further propose a novel framework for the systematic classification of TAD boundary changes. Colocalization- and gene enrichment analysis of different types of TAD boundary changes revealed distinct biological functionality associated with them. TADCompare is available on https://github.com/dozmorovlab/TADCompare.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Blakely, Logan. "Spectral Clustering for Electrical Phase Identification Using Advanced Metering Infrastructure Voltage Time Series". Thesis, Portland State University, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10980011.

Texto completo
Resumen

The increasing demand for and prevalence of distributed energy resources (DER) such as solar power, electric vehicles, and energy storage, present a unique set of challenges for integration into a legacy power grid, and accurate models of the low-voltage distribution systems are critical for accurate simulations of DER. Accurate labeling of the phase connections for each customer in a utility model is one area of grid topology that is known to have errors and has implications for the safety, efficiency, and hosting capacity of a distribution system. This research presents a methodology for the phase identification of customers solely using the advanced metering infrastructure (AMI) voltage timeseries. This thesis proposes to use Spectral Clustering, combined with a sliding window ensemble method for utilizing a long-term, time-series dataset that includes missing data, to group customers within a lateral by phase. These clustering phase predictions validate over 90% of the existing phase labels in the model and identify customers where the current phase labels are incorrect in this model. Within this dataset, this methodology produces consistent, high-quality results, verified by validating the clustering phase predictions with the underlying topology of the system, as well as selected examples verified using satellite and street view images publicly available in Google Earth. Further analysis of the results of the Spectral Clustering predictions are also shown to not only validate and improve the phase labels in the utility model, but also show potential in the detection of other types of errors in the topology of the model such as errors in the labeling of connections between customers and transformers, unlabeled residential solar power, unlabeled transformers, and locating customers with incomplete information in the model. These results indicate excellent potential for further development of this methodology as a tool for validating and improving existing utility models of the low-voltage side of the distribution system.

Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Reizer, Gabriella v. "Stability Selection of the Number of Clusters". Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/math_theses/98.

Texto completo
Resumen
Selecting the number of clusters is one of the greatest challenges in clustering analysis. In this thesis, we propose a variety of stability selection criteria based on cross validation for determining the number of clusters. Clustering stability measures the agreement of clusterings obtained by applying the same clustering algorithm on multiple independent and identically distributed samples. We propose to measure the clustering stability by the correlation between two clustering functions. These criteria are motivated by the concept of clustering instability proposed by Wang (2010), which is based on a form of clustering distance. In addition, the effectiveness and robustness of the proposed methods are numerically demonstrated on a variety of simulated and real world samples.
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Passmoor, Sean Stuart. "Clustering studies of radio-selected galaxies". Thesis, University of the Western Cape, 2011. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_7521_1332410859.

Texto completo
Resumen

We investigate the clustering of HI-selected galaxies in the ALFALFA survey and compare results with those obtained for HIPASS. Measurements of the angular correlation function and the inferred 3D-clustering are compared with results from direct spatial-correlation measurements. We are able to measure clustering on smaller angular scales and for galaxies with lower HI masses than was previously possible. We calculate the expected clustering of dark matter using the redshift distributions of HIPASS and ALFALFA and show that the ALFALFA sample is somewhat more anti-biased with respect to dark matter than the HIPASS sample. We are able to conform the validity of the dark matter correlation predictions by performing simulations of the non-linear structure formation. Further we examine how the bias evolves with redshift for radio galaxies detected in the the first survey.

Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Furuhashi, Takeshi, Tomohiro Yoshikawa y Kazuto Inagaki. "A Study on Extraction of Minority Groups in Questionnaire Data based on Spectral Clustering". IEEE, 2014. http://hdl.handle.net/2237/20713.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Bellam, Venkata Pavan Kumar. "Efficient Community Detection for Large Scale Networks via Sub-sampling". Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/81862.

Texto completo
Resumen
Many real-world systems can be represented as network-graphs. Some of the networks have an inherent community structure based on interactions. The problem of identifying this grouping structure given a graph is termed as community detection problem which has certain existing algorithms. This thesis contributes by providing specific improvements to various community detection algorithms such as spectral clustering and extreme point algorithm. One of the main contributions is proposing a new sub-sampling method to make existing spectral clustering method scalable by reducing the computational complexity. Also, we have implemented extreme points algorithm for a general multiple communities detection case along with a sub-sampling based version to reduce the computational complexity. We have also developed spectral clustering algorithm for popularity-adjusted block model (PABM) model based graphs to make the algorithm exact thus improving its accuracy.
Master of Science
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Felizardo, Rui Miguel Meireles. "A study on parallel versus sequential relational fuzzy clustering methods". Master's thesis, Faculdade de Ciências e Tecnologia, 2011. http://hdl.handle.net/10362/5663.

Texto completo
Resumen
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Relational Fuzzy Clustering is a recent growing area of study. New algorithms have been developed,as FastMap Fuzzy c-Means (FMFCM) and the Fuzzy Additive Spectral Clustering Method(FADDIS), for which it had been obtained interesting experimental results in the corresponding founding works. Since these algorithms are new in the context of the Fuzzy Relational clustering community, not many experimental studies are available. This thesis comes in response to the need of further investigation on these algorithms, concerning a comparative experimental study from the two families of algorithms: the parallel and the sequential versions. These two families of algorithms differ in the way they cluster data. Parallel versions extract clusters simultaneously from data and need the number of clusters as an input parameter of the algorithms, while the sequential versions extract clusters one-by-one until a stop condition is verified, being the number of clusters a natural output of the algorithm. The algorithms are studied in their effectiveness on retrieving good cluster structures by analysing the quality of the partitions as well as the determination of the number of clusters by applying several validation measures. An extensive simulation study has been conducted over two data generators specifically constructed for the algorithms under study, in particular to study their robustness for data with noise. Results with benchmark real data are also discussed. Particular attention is made on the most adequate pre-processing on relational data, in particular on the pseudo-inverse Laplacian transformation.
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Stephani, Henrike [Verfasser], Gabriele [Akademischer Betreuer] Steidl y Erich Peter [Akademischer Betreuer] Klement. "Automatic Segmentation and Clustering of Spectral Terahertz Data / Henrike Stephani. Betreuer: Gabriele Steidl ; Erich Peter Klement". Kaiserslautern : Technische Universität Kaiserslautern, 2012. http://d-nb.info/1027625681/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Stephani, Henrike [Verfasser], Gabriele Akademischer Betreuer] Steidl y Erich Peter [Akademischer Betreuer] [Klement. "Automatic Segmentation and Clustering of Spectral Terahertz Data / Henrike Stephani. Betreuer: Gabriele Steidl ; Erich Peter Klement". Kaiserslautern : Technische Universität Kaiserslautern, 2012. http://nbn-resolving.de/urn:nbn:de:hbz:386-kluedo-31630.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Amaduzzi, Alberto. "Enzymes' characterization via spectral analysis of the Laplacian associated to their relative contact maps". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23899/.

Texto completo
Resumen
The main motivation for my thesis is the believe that global properties of enzymes areessential for a complete understanding of their behaviors. In my thesis, in particular,I investigate qualitative properties of enzymes via spectral techniques associated to thegraph Laplacian. I try to apply visualization techniques to understand similarities anddissimilarities among different enzymes’ structures, encoded in adjacency matrices re-trieved from coordinate data in online available datasets. The purpose is to make anexploration of features and see whether these techniques, that are used extensively inliterature for visual discrimination tasks, are also useful for these biological entities.I have tried to design a size-independent analysis that would be able to differentiateamong different taxonomies, different catalytic properties and different environmentsassociated to enzymes. This attempt provided useful hints for the analysis of enzymeproperties, even if as a final remark the dependence from enzyme size is still found inthe Laplacian eigenvalue spectrum.
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Bezek, Perit. "A Clustering Method For The Problem Of Protein Subcellular Localization". Master's thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/12607981/index.pdf.

Texto completo
Resumen
In this study, the focus is on predicting the subcellular localization of a protein, since subcellular localization is helpful in understanding a protein&rsquo
s functions. Function of a protein may be estimated from its sequence. Motifs or conserved subsequences are strong indicators of function. In a given sample set of protein sequences known to perform the same function, a certain subsequence or group of subsequences should be common
that is, occurrence (frequency) of common subsequences should be high. Our idea is to find the common subsequences through clustering and use these common groups (implicit motifs) to classify proteins. To calculate the distance between two subsequences, traditional string edit distance is modified so that only replacement is allowed and the cost of replacement is related to an amino acid substitution matrix. Based on the modified string edit distance, spectral clustering embeds the subsequences into some transformed space for which the clustering problem is expected to become easier to solve. For a given protein sequence, distribution of its subsequences over the clusters is the feature vector which is subsequently fed to a classifier. The most important aspect if this approach is the use of spectral clustering based on modified string edit distance.
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Curado, Manuel. "Structural Similarity: Applications to Object Recognition and Clustering". Doctoral thesis, Universidad de Alicante, 2018. http://hdl.handle.net/10045/98110.

Texto completo
Resumen
In this thesis, we propose many developments in the context of Structural Similarity. We address both node (local) similarity and graph (global) similarity. Concerning node similarity, we focus on improving the diffusive process leading to compute this similarity (e.g. Commute Times) by means of modifying or rewiring the structure of the graph (Graph Densification), although some advances in Laplacian-based ranking are also included in this document. Graph Densification is a particular case of what we call graph rewiring, i.e. a novel field (similar to image processing) where input graphs are rewired to be better conditioned for the subsequent pattern recognition tasks (e.g. clustering). In the thesis, we contribute with an scalable an effective method driven by Dirichlet processes. We propose both a completely unsupervised and a semi-supervised approach for Dirichlet densification. We also contribute with new random walkers (Return Random Walks) that are useful structural filters as well as asymmetry detectors in directed brain networks used to make early predictions of Alzheimer's disease (AD). Graph similarity is addressed by means of designing structural information channels as a means of measuring the Mutual Information between graphs. To this end, we first embed the graphs by means of Commute Times. Commute times embeddings have good properties for Delaunay triangulations (the typical representation for Graph Matching in computer vision). This means that these embeddings can act as encoders in the channel as well as decoders (since they are invertible). Consequently, structural noise can be modelled by the deformation introduced in one of the manifolds to fit the other one. This methodology leads to a very high discriminative similarity measure, since the Mutual Information is measured on the manifolds (vectorial domain) through copulas and bypass entropy estimators. This is consistent with the methodology of decoupling the measurement of graph similarity in two steps: a) linearizing the Quadratic Assignment Problem (QAP) by means of the embedding trick, and b) measuring similarity in vector spaces. The QAP problem is also investigated in this thesis. More precisely, we analyze the behaviour of $m$-best Graph Matching methods. These methods usually start by a couple of best solutions and then expand locally the search space by excluding previous clamped variables. The next variable to clamp is usually selected randomly, but we show that this reduces the performance when structural noise arises (outliers). Alternatively, we propose several heuristics for spanning the search space and evaluate all of them, showing that they are usually better than random selection. These heuristics are particularly interesting because they exploit the structure of the affinity matrix. Efficiency is improved as well. Concerning the application domains explored in this thesis we focus on object recognition (graph similarity), clustering (rewiring), compression/decompression of graphs (links with Extremal Graph Theory), 3D shape simplification (sparsification) and early prediction of AD.
Ministerio de Economía, Industria y Competitividad (Referencia TIN2012-32839 BES-2013-064482)
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Brocklebank, Sean. "Inquiry into the nature and causes of individual differences in economics". Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/6281.

Texto completo
Resumen
The thesis contains four chapters on the structure and predictability of individual differences Chapter 1. Re-analyses data from Holt and Laury's (2002) risk aversion experiments. Shows that big-stakes hypothetical payoffs are better than small-stakes real-money payoffs for predicting choices in big-stakes real-money gambles (in spite of the presence of hypothetical bias). Argues that hypothetical bias is a problem for calibration of mean preferences but not for prediction of the rank order of subjects' preferences. Chapter 2. Describes an experiment: Participants were given personality tests and played a series of dictator and response games over a two week period. It was found that social preferences are one-dimensional, stable across a two-week interval and significantly related to the Big Five personality traits. Suggestions are given about ways to modify existing theories of social preference to accommodate these findings. Chapter 3. Applies a novel statistical technique (spectral clustering) to a personality data set for the first time. Finds the HEXACO six-factor structure in an English-language five-factor questionnaire for the first time. Argues that the emphasis placed on weak relationships is critical to settling the dimensionality debate within personality theory, and that spectral clustering provides a more useful perspective on personality data than does traditional factor analysis. Chapter 4. Outlines the relevance of extraversion for economics, and sets up a model to argue that personality differences in extraversion may have evolved through something akin to a war of attrition. This model implies a positive relationship between extraversion and risk aversion, and a U-shaped relationship between extraversion and loss aversion.
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Fairbanks, James Paul. "Graph analysis combining numerical, statistical, and streaming techniques". Diss., Georgia Institute of Technology, 2016. http://hdl.handle.net/1853/54972.

Texto completo
Resumen
Graph analysis uses graph data collected on a physical, biological, or social phenomena to shed light on the underlying dynamics and behavior of the agents in that system. Many fields contribute to this topic including graph theory, algorithms, statistics, machine learning, and linear algebra. This dissertation advances a novel framework for dynamic graph analysis that combines numerical, statistical, and streaming algorithms to provide deep understanding into evolving networks. For example, one can be interested in the changing influence structure over time. These disparate techniques each contribute a fragment to understanding the graph; however, their combination allows us to understand dynamic behavior and graph structure. Spectral partitioning methods rely on eigenvectors for solving data analysis problems such as clustering. Eigenvectors of large sparse systems must be approximated with iterative methods. This dissertation analyzes how data analysis accuracy depends on the numerical accuracy of the eigensolver. This leads to new bounds on the residual tolerance necessary to guarantee correct partitioning. We present a novel stopping criterion for spectral partitioning guaranteed to satisfy the Cheeger inequality along with an empirical study of the performance on real world networks such as web, social, and e-commerce networks. This work bridges the gap between numerical analysis and computational data analysis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

He, Guanlin. "Parallel algorithms for clustering large datasets on CPU-GPU heterogeneous architectures". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG062.

Texto completo
Resumen
Clustering, qui consiste à réaliser des groupements naturels de données, est une tâche fondamentale et difficile dans l'apprentissage automatique et l'exploration de données. De nombreuses méthodes de clustering ont été proposées dans le passé, parmi lesquelles le clustering en k-moyennes qui est une méthode couramment utilisée en raison de sa simplicité et de sa rapidité.Le clustering spectral est une approche plus récente qui permet généralement d'obtenir une meilleure qualité de clustering que les k-moyennes. Cependant, les algorithmes classiques de clustering spectral souffrent d'un manque de passage à l'échelle en raison de leurs grandes complexités en nombre d'opérations et en espace mémoire nécessaires. Ce problème de passage à l'échelle peut être traité en appliquant des méthodes d'approximation ou en utilisant le calcul parallèle et distribué.L'objectif de cette thèse est d'accélérer le clustering spectral et de le rendre applicable à de grands ensembles de données en combinant l'approximation basée sur des données représentatives avec le calcul parallèle sur processeurs CPU et GPU. En considérant différents scénarios, nous proposons plusieurs chaînes de traitement parallèle pour le clustering spectral à grande échelle. Nous concevons des algorithmes et des implémentations parallèles optimisés pour les modules de chaque chaîne proposée : un algorithme parallèle des k-moyennes sur CPU et GPU, un clustering spectral parallèle sur GPU avec un format de stockage creux, un filtrage parallèle sur GPU du bruit dans les données, etc. Nos expériences variées atteignent de grandes performances et valident le passage à l'échelle de chaque module et de nos chaînes complètes
Clustering, which aims at achieving natural groupings of data, is a fundamental and challenging task in machine learning and data mining. Numerous clustering methods have been proposed in the past, among which k-means is one of the most famous and commonly used methods due to its simplicity and efficiency.Spectral clustering is a more recent approach that usually achieves higher clustering quality than k-means. However, classical algorithms of spectral clustering suffer from a lack of scalability due to their high complexities in terms of number of operations and memory space requirements. This scalability challenge can be addressed by applying approximation methods or by employing parallel and distributed computing.The objective of this thesis is to accelerate spectral clustering and make it scalable to large datasets by combining representatives-based approximation with parallel computing on CPU-GPU platforms. Considering different scenarios, we propose several parallel processing chains for large-scale spectral clustering. We design optimized parallel algorithms and implementations for each module of the proposed chains: parallel k-means on CPU and GPU, parallel spectral clustering on GPU using sparse storage format, parallel filtering of data noise on GPU, etc. Our various experiments reach high performance and validate the scalability of each module and the complete chains
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Lee, Zed Heeje. "A graph representation of event intervals for efficient clustering and classification". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281947.

Texto completo
Resumen
Sequences of event intervals occur in several application domains, while their inherent complexity hinders scalable solutions to tasks such as clustering and classification. In this thesis, we propose a novel spectral embedding representation of event interval sequences that relies on bipartite graphs. More concretely, each event interval sequence is represented by a bipartite graph by following three main steps: (1) creating a hash table that can quickly convert a collection of event interval sequences into a bipartite graph representation, (2) creating and regularizing a bi-adjacency matrix corresponding to the bipartite graph, (3) defining a spectral embedding mapping on the bi-adjacency matrix. In addition, we show that substantial improvements can be achieved with regard to classification performance through pruning parameters that capture the nature of the relations formed by the event intervals. We demonstrate through extensive experimental evaluation on five real-world datasets that our approach can obtain runtime speedups of up to two orders of magnitude compared to other state-of-the-art methods and similar or better clustering and classification performance.
Sekvenser av händelsesintervall förekommer i flera applikationsdomäner, medan deras inneboende komplexitet hindrar skalbara lösningar på uppgifter som kluster och klassificering. I den här avhandlingen föreslår vi en ny spektral inbäddningsrepresentation av händelsens intervallsekvenser som förlitar sig på bipartitgrafer. Mer konkret representeras varje händelsesintervalsekvens av en bipartitgraf genom att följa tre huvudsteg: (1) skapa en hashtabell som snabbt kan konvertera en samling händelsintervalsekvenser till en bipartig grafrepresentation, (2) skapa och reglera en bi-adjacency-matris som motsvarar bipartitgrafen, (3) definiera en spektral inbäddning på bi-adjacensmatrisen. Dessutom visar vi att väsentliga förbättringar kan uppnås med avseende på klassificeringsprestanda genom beskärningsparametrar som fångar arten av relationerna som bildas av händelsesintervallen. Vi demonstrerar genom omfattande experimentell utvärdering på fem verkliga datasätt att vår strategi kan erhålla runtime-hastigheter på upp till två storlekar jämfört med andra modernaste metoder och liknande eller bättre kluster- och klassificerings- prestanda.
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Gustavsson, Hanna. "Clustering Based Outlier Detection for Improved Situation Awareness within Air Traffic Control". Thesis, KTH, Optimeringslära och systemteori, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264215.

Texto completo
Resumen
The aim of this thesis is to examine clustering based outlier detection algorithms on their ability to detect abnormal events in flight traffic. A nominal model is trained on a data-set containing only flights which are labeled as normal. A detection scoring function based on the nominal model is used to decide if a new and in forehand unseen data-point behaves like the nominal model or not. Due to the unknown structure of the data-set three different clustering algorithms are examined for training the nominal model, K-means, Gaussian Mixture Model and Spectral Clustering. Depending on the nominal model different methods to obtain a detection scoring is used, such as metric distance, probability and OneClass Support Vector Machine. This thesis concludes that a clustering based outlier detection algorithm is feasible for detecting abnormal events in flight traffic. The best performance was obtained by using Spectral Clustering combined with a Oneclass Support Vector Machine. The accuracy on the test data-set was 95.8%. The algorithm managed to correctly classify 89.4% of the datapoints labeled as abnormal and correctly classified 96.2% of the datapoints labeled as normal.
Syftet med detta arbete är att undersöka huruvida klusterbaserad anomalidetektering kan upptäcka onormala händelser inom flygtrafik. En normalmodell är anpassad till data som endast innehåller flygturer som är märkta som normala. Givet denna normalmodell så anpassas en anomalidetekteringsfunktion så att data-punkter som är lika normalmodellen klassificeras som normala och data-punkter som är avvikande som anomalier. På grund av att strukturen av nomraldatan är okänd så är tre olika klustermetoder testade, K-means, Gaussian Mixture Model och Spektralklustering. Beroende på hur normalmodellen är modellerad så har olika metoder för anpassa en detekteringsfunktion används, så som baserat på avstånd, sannolikhet och slutligen genom One-class Support Vector Machine. Detta arbete kan dra slutsatsen att det är möjligt att detektera anomalier med hjälp av en klusterbaserad anomalidetektering. Den algoritm som presterade bäst var den som kombinerade spektralklustring med One-class Support Vector Machine. På test-datan så klassificerade algoritmen $95.8\%$ av all data korrekt. Av alla data-punkter som var märka som anomalier så klassificerade denna algoritm 89.4% rätt, och på de data-punkter som var märka som normala så klassificerade algoritmen 96.2% rätt.
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Miti, Filippo. "Mathematical models for cellular aggregation: the chemotactic instability and clustering formation". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12020/.

Texto completo
Resumen
In this thesis we present a mathematical formulation of the interaction between microorganisms such as bacteria or amoebae and chemicals, often produced by the organisms themselves. This interaction is called chemotaxis and leads to cellular aggregation. We derive some models to describe chemotaxis. The first is the pioneristic Keller-Segel parabolic-parabolic model and it is derived by two different frameworks: a macroscopic perspective and a microscopic perspective, in which we start with a stochastic differential equation and we perform a mean-field approximation. This parabolic model may be generalized by the introduction of a degenerate diffusion parameter, which depends on the density itself via a power law. Then we derive a model for chemotaxis based on Cattaneo's law of heat propagation with finite speed, which is a hyperbolic model. The last model proposed here is a hydrodynamic model, which takes into account the inertia of the system by a friction force. In the limit of strong friction, the model reduces to the parabolic model, whereas in the limit of weak friction, we recover a hyperbolic model. Finally, we analyze the instability condition, which is the condition that leads to aggregation, and we describe the different kinds of aggregates we may obtain: the parabolic models lead to clusters or peaks whereas the hyperbolic models lead to the formation of network patterns or filaments. Moreover, we discuss the analogy between bacterial colonies and self gravitating systems by comparing the chemotactic collapse and the gravitational collapse (Jeans instability).
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Henley, Lisa. "The quantification and visualisation of human flourishing". Thesis, University of Canterbury. School of Mathematics and Statistics, 2015. http://hdl.handle.net/10092/10441.

Texto completo
Resumen
Economic indicators such as GDP have been a main indicator of human progress since the first half of last century. There is concern that continuing to measure our progress and / or wellbeing using measures that encourage consumption on a planet with limited resources, may not be ideal. Alternative measures of human progress, have a top down approach where the creators decide what the measure will contain. This work defines a 'bottom up' methodology an example of measuring human progress that doesn't require manual data reduction. The technique allows visual overlay of other 'factors' that users may feel are particularly important. I designed and wrote a genetic algorithm, which, in conjunction with regression analysis, was used to select the 'most important' variables from a large range of variables loosely associated with the topic. This approach could be applied in many areas where there are a lot of data from which an analyst must choose. Next I designed and wrote a genetic algorithm to explore the evolution of a spectral clustering solution over time. Additionally, I designed and wrote a genetic algorithm with a multi-faceted fitness function which I used to select the most appropriate clustering procedure from a range of hierarchical agglomerative methods. Evolving the algorithm over time was not successful in this instance, but the approach holds a lot of promise as an alternative to 'scoring' new data based on an original solution, and as a method for using alternate procedural options to those an analyst might normally select. The final solution allowed an evolution of the number of clusters with a fixed clustering method and variable selection over time. Profiling with various external data sources gave consistent and interesting interpretations to the clusters.
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Storer, Jeremy J. "Computational Intelligence and Data Mining Techniques Using the Fire Data Set". Bowling Green State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1460129796.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Schuetter, Jared Michael. "Cairn Detection in Southern Arabia Using a Supervised Automatic Detection Algorithm and Multiple Sample Data Spectroscopic Clustering". The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1269567071.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Benigni, Matthew Curran. "Detection and Analysis of Online Extremist Communities". Research Showcase @ CMU, 2017. http://repository.cmu.edu/dissertations/949.

Texto completo
Resumen
Online social networks have become a powerful venue for political activism. In many cases large, insular online communities form that have been shown to be powerful diffusion mechanisms of both misinformation and propaganda. In some cases these groups users advocate actions or policies that could be construed as extreme along nearly any distribution of opinion, and are thus called Online Extremist Communities (OECs). Although these communities appear increasingly common, little is known about how these groups form or the methods used to influence them. The work in this thesis provides researchers a methodological framework to study these groups by answering three critical research questions: How can we detect large dynamic online activist or extremist communities? What automated tools are used to build, isolate, and influence these communities? What methods can be used to gain novel insight into large online activist or extremist communities? These group members social ties can be inferred based on the various affordances offered by OSNs for group curation. By developing heterogeneous, annotated graph representations of user behavior I can efficiently extract online activist discussion cores using an ensemble of unsupervised machine learning methods. I call this technique Ensemble Agreement Clustering. Through manual inspection, these discussion cores can then often be used as training data to detect the larger community. I present a novel supervised learning algorithm called Multiplex Vertex Classification for network bipartition on heterogeneous, annotated graphs. This methodological pipeline has also proven useful for social botnet detection, and a study of large, complex social botnets used for propaganda dissemination is provided as well. Throughout this thesis I provide Twitter case studies including communities focused on the Islamic State of Iraq and al-Sham (ISIS), the ongoing Syrian Revolution, the Euromaidan Movement in Ukraine, as well as the alt-Right.
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Fender, Alexandre. "Solutions parallèles pour les grands problèmes de valeurs propres issus de l'analyse de graphe". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLV069/document.

Texto completo
Resumen
Les graphes, ou réseaux, sont des structures mathématiques représentant des relations entre des éléments. Ces systèmes peuvent être analysés dans le but d’extraire des informations sur la structure globale ou sur des composants individuels. L'analyse de graphe conduit souvent à des problèmes hautement complexes à résoudre. À grande échelle, le coût de calcul de la solution exacte est prohibitif. Heureusement, il est possible d’utiliser des méthodes d’approximations itératives pour parvenir à des estimations précises. Lesméthodes historiques adaptées à un petit nombre de variables ne conviennent pas aux matrices creuses de grande taille provenant des graphes. Par conséquent, la conception de solveurs fiables, évolutifs, et efficaces demeure un problème essentiel. L’émergence d'architectures parallèles telles que le GPU ouvre également de nouvelles perspectives avec des progrès concernant à la fois la puissance de calcul et l'efficacité énergétique. Nos travaux ciblent la résolution de problèmes de valeurs propres de grande taille provenant des méthodes d’analyse de graphe dans le but d'utiliser efficacement les architectures parallèles. Nous présentons le domaine de l'analyse spectrale de grands réseaux puis proposons de nouveaux algorithmes et implémentations parallèles. Les résultats expérimentaux indiquent des améliorations conséquentes dans des applications réelles comme la détection de communautés et les indicateurs de popularité
Graphs, or networks, are mathematical structures to represent relations between elements. These systems can be analyzed to extract information upon the comprehensive structure or the nature of individual components. The analysis of networks often results in problems of high complexity. At large scale, the exact solution is prohibitively expensive to compute. Fortunately, this is an area where iterative approximation methods can be employed to find accurate estimations. Historical methods suitable for a small number of variables could not scale to large and sparse matrices arising in graph applications. Therefore, the design of scalable and efficient solvers remains an essential problem. Simultaneously, the emergence of parallel architecture such as GPU revealed remarkable ameliorations regarding performances and power efficiency. In this dissertation, we focus on solving large eigenvalue problems a rising in network analytics with the goal of efficiently utilizing parallel architectures. We revisit the spectral graph analysis theory and propose novel parallel algorithms and implementations. Experimental results indicate improvements on real and large applications in the context of ranking and clustering problems
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Mandrell, Christopher. "IMPROVING SPECTRAL ANALYSIS WITH THE APPLICATION OF MACHINE LEARNING: STUDY OF LASER-INDUCED BREAKDOWN SPECTROSCOPY (LIBS) AND RAMAN SPECTROSCOPY WITH CLASSIFICATION AND CLUSTERING TECHNIQUES". OpenSIUC, 2020. https://opensiuc.lib.siu.edu/theses/2665.

Texto completo
Resumen
AN ABSTRACT OF THE THESIS OFChristopher T. Mandrell, for the Master of Science degree in Physics, presented on April 8, 2020, at Southern Illinois University Carbondale.TITLE: IMPROVING SPECTRAL ANALYSIS WITH THE APPLICATION OF MACHINE LEARNING: STUDY OF LASER-INDUCED BREAKDOWN SPECTROSCOPY (LIBS) AND RAMAN SPECTROSCOPY WITH CLASSIFICATION AND CLUSTERING TECHNIQUESMAJOR PROFESSOR: Dr. Poopalasingam SivakumarAtomic and molecular spectroscopy, in the form of LIBS emissions and Raman scattering, respectively, are tools that provide a vast amount of information with little to no sample preparation. For this reason, these techniques are finding their way into a wide range of fields. However, each spectrum is notoriously complicated to analyze, with many complex interactions at play. Machine learning is the result of work on artificial intelligence. It provides tools to train a computer to look for connections in complex data sets that would likely be missed, or not even looked for, by other analytical methods. The combination of highly informative yet complex data with an analysis that is specifically designed to probe highly complex data for meaningful information is a logical step in the analysis of these spectra. Here we apply statistical analysis and classification algorithms to Raman spectra of pancreatic cancer cells and clustering algorithms to LIBS spectra of Mars Curiosity Rover simulants and Raman spectra of Mars Perseverance Rover simulants. We report here high accuracy in the classification of different types of pancreatic cancer cells, and informative clustering of the two Mars rovers’ simulant data.
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Mouysset, Sandrine. "Contributions à l'étude de la classification spectrale et applications". Phd thesis, Institut National Polytechnique de Toulouse - INPT, 2010. http://tel.archives-ouvertes.fr/tel-00573433.

Texto completo
Resumen
La classification spectrale consiste à créer, à partir des éléments spectraux d'une matrice d'affinité gaussienne, un espace de dimension réduite dans lequel les données sont regroupées en classes. Cette méthode non supervisée est principalement basée sur la mesure d'affinité gaussienne, son paramètre et ses éléments spectraux. Cependant, les questions sur la séparabilité des classes dans l'espace de projection spectral et sur le choix du paramètre restent ouvertes. Dans un premier temps, le rôle du paramètre de l'affinité gaussienne sera étudié à travers des mesures de qualités et deux heuristiques pour le choix de ce paramètre seront proposées puis testées. Ensuite, le fonctionnement même de la méthode est étudié à travers les éléments spectraux de la matrice d'affinité gaussienne. En interprétant cette matrice comme la discrétisation du noyau de la chaleur définie sur l'espace entier et en utilisant les éléments finis, les vecteurs propres de la matrice affinité sont la représentation asymptotique de fonctions dont le support est inclus dans une seule composante connexe. Ces résultats permettent de définir des propriétés de classification et des conditions sur le paramètre gaussien. A partir de ces éléments théoriques, deux stratégies de parallélisation par décomposition en sous-domaines sont formulées et testées sur des exemples géométriques et de traitement d'images. Enfin dans le cadre non supervisé, le classification spectrale est appliquée, d'une part, dans le domaine de la génomique pour déterminer différents profils d'expression de gènes d'une légumineuse et, d'autre part dans le domaine de l'imagerie fonctionnelle TEP, pour segmenter des régions du cerveau présentant les mêmes courbes d'activités temporelles.
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Westerlund, Annie M. "Computational Study of Calmodulin’s Ca2+-dependent Conformational Ensembles". Licentiate thesis, KTH, Biofysik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-234888.

Texto completo
Resumen
Ca2+ and calmodulin play important roles in many physiologically crucial pathways. The conformational landscape of calmodulin is intriguing. Conformational changes allow for binding target-proteins, while binding Ca2+ yields population shifts within the landscape. Thus, target-proteins become Ca2+-sensitive upon calmodulin binding. Calmodulin regulates more than 300 target-proteins, and mutations are linked to lethal disorders. The mechanisms underlying Ca2+ and target-protein binding are complex and pose interesting questions. Such questions are typically addressed with experiments which fail to provide simultaneous molecular and dynamics insights. In this thesis, questions on binding mechanisms are probed with molecular dynamics simulations together with tailored unsupervised learning and data analysis. In Paper 1, a free energy landscape estimator based on Gaussian mixture models with cross-validation was developed and used to evaluate the efficiency of regular molecular dynamics compared to temperature-enhanced molecular dynamics. This comparison revealed interesting properties of the free energy landscapes, highlighting different behaviors of the Ca2+-bound and unbound calmodulin conformational ensembles. In Paper 2, spectral clustering was used to shed light on Ca2+ and target protein binding. With these tools, it was possible to characterize differences in target-protein binding depending on Ca2+-state as well as N-terminal or C-terminal lobe binding. This work invites data-driven analysis into the field of biomolecule molecular dynamics, provides further insight into calmodulin’s Ca2+ and targetprotein binding, and serves as a stepping-stone towards a complete understanding of calmodulin’s Ca2+-dependent conformational ensembles.

QC 20180912

Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Nardoni, Chiara. "Diffusion maps in the Subriemannian motion group and perceptual grouping". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/6971/.

Texto completo
Resumen
Il presente lavoro è motivato dal problema della constituzione di unità percettive a livello della corteccia visiva primaria V1. Si studia dettagliatamente il modello geometrico di Citti-Sarti con particolare attenzione alla modellazione di fenomeni di associazione visiva. Viene studiato nel dettaglio un modello di connettività. Il contributo originale risiede nell'adattamento del metodo delle diffusion maps, recentemente introdotto da Coifman e Lafon, alla geometria subriemanniana della corteccia visiva. Vengono utilizzati strumenti di teoria del potenziale, teoria spettrale, analisi armonica in gruppi di Lie per l'approssimazione delle autofunzioni dell'operatore del calore sul gruppo dei moti rigidi del piano. Le autofunzioni sono utilizzate per l'estrazione di unità percettive nello stimolo visivo. Sono presentate prove sperimentali e originali delle capacità performanti del metodo.
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Witt, Walter G. "Quantifying the Structure of Misfolded Proteins Using Graph Theory". Digital Commons @ East Tennessee State University, 2017. https://dc.etsu.edu/etd/3244.

Texto completo
Resumen
The structure of a protein molecule is highly correlated to its function. Some diseases such as cystic fibrosis are the result of a change in the structure of a protein so that this change interferes or inhibits its function. Often these changes in structure are caused by a misfolding of the protein molecule. To assist computational biologists, there is a database of proteins together with their misfolded versions, called decoys, that can be used to test the accuracy of protein structure prediction algorithms. In our work we use a nested graph model to quantify a selected set of proteins that have two single misfold decoys. The graph theoretic model used is a three tiered nested graph. Measures based on the vertex weights are calculated and we compare the quantification of the proteins with their decoys. Our method is able to separate the misfolded proteins from the correctly folded proteins.
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Saade, Alaa. "Spectral inference methods on sparse graphs : theory and applications". Thesis, Paris Sciences et Lettres (ComUE), 2016. http://www.theses.fr/2016PSLEE024/document.

Texto completo
Resumen
Face au déluge actuel de données principalement non structurées, les graphes ont démontré, dans une variété de domaines scientifiques, leur importance croissante comme language abstrait pour décrire des interactions complexes entre des objets complexes. L’un des principaux défis posés par l’étude de ces réseaux est l’inférence de propriétés macroscopiques à grande échelle, affectant un grand nombre d’objets ou d’agents, sur la seule base des interactions microscopiquesqu’entretiennent leurs constituants élémentaires. La physique statistique, créée précisément dans le but d’obtenir les lois macroscopiques de la thermodynamique à partir d’un modèle idéal de particules en interaction, fournit une intuition décisive dans l’étude des réseaux complexes.Dans cette thèse, nous utilisons des méthodes issues de la physique statistique des systèmes désordonnés pour mettre au point et analyser de nouveaux algorithmes d’inférence sur les graphes. Nous nous concentrons sur les méthodes spectrales, utilisant certains vecteurs propres de matrices bien choisies, et sur les graphes parcimonieux, qui contiennent une faible quantité d’information. Nous développons une théorie originale de l’inférence spectrale, fondée sur une relaxation de l’optimisation de certaines énergies libres en champ moyen. Notre approche est donc entièrement probabiliste, et diffère considérablement des motivations plus classiques fondées sur l’optimisation d’une fonction de coût. Nous illustrons l’efficacité de notre approchesur différents problèmes, dont la détection de communautés, la classification non supervisée à partir de similarités mesurées aléatoirement, et la complétion de matrices
In an era of unprecedented deluge of (mostly unstructured) data, graphs are proving more and more useful, across the sciences, as a flexible abstraction to capture complex relationships between complex objects. One of the main challenges arising in the study of such networks is the inference of macroscopic, large-scale properties affecting a large number of objects, based solely on he microscopic interactions between their elementary constituents. Statistical physics, precisely created to recover the macroscopic laws of thermodynamics from an idealized model of interacting particles, provides significant insight to tackle such complex networks.In this dissertation, we use methods derived from the statistical physics of disordered systems to design and study new algorithms for inference on graphs. Our focus is on spectral methods, based on certain eigenvectors of carefully chosen matrices, and sparse graphs, containing only a small amount of information. We develop an original theory of spectral inference based on a relaxation of various meanfield free energy optimizations. Our approach is therefore fully probabilistic, and contrasts with more traditional motivations based on the optimization of a cost function. We illustrate the efficiency of our approach on various problems, including community detection, randomized similarity-based clustering, and matrix completion
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Voiron, Nicolas. "Structuration de bases multimédia pour une exploration visuelle". Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAA036/document.

Texto completo
Resumen
La forte augmentation du volume de données multimédia impose la mise au point de solutions adaptées pour une exploration visuelle efficace des bases multimédia. Après avoir examiné les processus de visualisation mis en jeu, nous remarquons que ceci demande une structuration des données. L’objectif principal de cette thèse est de proposer et d’étudier ces méthodes de structuration des bases multimédia en vue de leur exploration visuelle.Nous commençons par un état de l’art détaillant les données et les mesures que nous pouvons produire en fonction de la nature des variables décrivant les données. Suit un examen des techniques de structuration par projection et classification. Nous présentons aussi en détail la technique du Clustering Spectral sur laquelle nous nous focaliserons ensuite.Notre première réalisation est une méthode originale de production et fusion de métriques par corrélation de rang. Nous testons cette première méthode sur une base multimédia issue de la vidéothèque d’un festival de films. Nous continuons ensuite par la mise au point d’une méthode de classification supervisée par corrélation que nous testons avec les données vidéos d’un challenge de la communauté multimédia. Ensuite nous nous focalisons sur les techniques du Clustering Spectral. Nous testons une technique de Clustering Spectral supervisée que nous comparons aux techniques de l’état de l’art. Et pour finir nous examinons des techniques du Clustering Spectral semi-supervisé actif. Dans ce contexte, nous proposons et validons des techniques de propagation d’annotations et des stratégies permettant d’améliorer la convergence de ces méthodes de classement
The large increase in multimedia data volume requires the development of effective solutions for visual exploration of multimedia databases. After reviewing the visualization process involved, we emphasis the need of data structuration. The main objective of this thesis is to propose and study clustering and classification of multimedia database for their visual exploration.We begin with a state of the art detailing the data and the metrics we can produce according to the nature of the variables describing each document. Follows a review of the projection and classification techniques. We also present in detail the Spectral Clustering method.Our first contribution is an original method that produces fusion of metrics using rank correlations. We validate this method on an animation movie database coming from an international festival. Then we propose a supervised classification method based on rank correlation. This contribution is evaluated on a multimedia challenge dataset. Then we focus on Spectral Clustering methods. We test a supervised Spectral Clustering technique and compare to state of the art methods. Finally we examine active semi-supervised Spectral Clustering methods. In this context, we propose and validate constraint propagation techniques and strategies to improve the convergence of these active methods
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía