To see the other types of publications on this topic, follow the link: Cluster analysis – Data processing.

Dissertations / Theses on the topic 'Cluster analysis – Data processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Cluster analysis – Data processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Yiqun. "Advances in categorical data clustering." HKBU Institutional Repository, 2019. https://repository.hkbu.edu.hk/etd_oa/658.

Full text
Abstract:
Categorical data are common in various research areas, and clustering is a prevalent technique used for analyse them. However, two challenging problems are encountered in categorical data clustering analysis. The first is that most categorical data distance metrics were actually proposed for nominal data (i.e., a categorical data set that comprises only nominal attributes), ignoring the fact that ordinal attributes are also common in various categorical data sets. As a result, these nominal data distance metrics cannot account for the order information of ordinal attributes and may thus inappropriately measure the distances for ordinal data (i.e., a categorical data set that comprises only ordinal attributes) and mixed categorical data (i.e., a categorical data set that comprises both ordinal and nominal attributes). The second problem is that most hierarchical clustering approaches were actually designed for numerical data and have very high computation costs; that is, with time complexity O(N2) for a data set with N data objects. These issues have presented huge obstacles to the clustering analysis of categorical data. To address the ordinal data distance measurement problem, we studied the characteristics of ordered possible values (also called 'categories' interchangeably in this thesis) of ordinal attributes and propose a novel ordinal data distance metric, which we call the Entropy-Based Distance Metric (EBDM), to quantify the distances between ordinal categories. The EBDM adopts cumulative entropy as a measure to indicate the amount of information in the ordinal categories and simulates the thinking process of changing one's mind between two ordered choices to quantify the distances according to the amount of information in the ordinal categories. The order relationship and the statistical information of the ordinal categories are both considered by the EBDM for more appropriate distance measurement. Experimental results illustrate the superiority of the proposed EBDM in ordinal data clustering. In addition to designing an ordinal data distance metric, we further propose a unified categorical data distance metric that is suitable for distance measurement of all three types of categorical data (i.e., ordinal data, nominal data, and mixed categorical data). The extended version uniformly defines distances and attribute weights for both ordinal and nominal attributes, by which the distances measured for the two types of attributes of a mixed categorical data can be directly combined to obtain the overall distances between data objects with no information loss. Extensive experiments on all three types of categorical data sets demonstrate the effectiveness of the unified distance metric in clustering analysis of categorical data. To address the hierarchical clustering problem of large-scale categorical data, we propose a fast hierarchical clustering framework called the Growing Multi-layer Topology Training (GMTT). The most significant merit of this framework is its ability to reduce the time complexity of most existing hierarchical clustering frameworks (i.e., O(N2)) to O(N1.5) without sacrificing the quality (i.e., clustering accuracy and hierarchical details) of the constructed hierarchy. According to our design, the GMTT framework is applicable to categorical data clustering simply by adopting a categorical data distance metric. To make the GMTT framework suitable for the processing of streaming categorical data, we also provide an incremental version of GMTT that can dynamically adopt new inputs into the hierarchy via local updating. Theoretical analysis proves that the GMTT frameworks have time complexity O(N1.5). Extensive experiments show the efficacy of the GMTT frameworks and demonstrate that they achieve more competitive categorical data clustering performance by adopting the proposed unified distance metric.
APA, Harvard, Vancouver, ISO, and other styles
2

Jia, Hong. "Clustering of categorical and numerical data without knowing cluster number." HKBU Institutional Repository, 2013. http://repository.hkbu.edu.hk/etd_ra/1495.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Yang, Bin, and 杨彬. "A novel framework for binning environmental genomic fragments." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2010. http://hub.hku.hk/bib/B45789344.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Junjie. "Some algorithmic studies in high-dimensional categorical data clustering and selection number of clusters." HKBU Institutional Repository, 2008. http://repository.hkbu.edu.hk/etd_ra/1011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lee, King-for Foris, and 李敬科. "Clustering uncertain data using Voronoi diagram." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B43224131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ptitsyn, Andrey. "New algorithms for EST clustering." Thesis, University of the Western Cape, 2000. http://etd.uwc.ac.za/index.php?module=etd&amp.

Full text
Abstract:
Expressed sequence tag database is a rich and fast growing source of data for gene expression analysis and drug discovery. Clustering of raw EST data is a necessary step for further analysis and one of the most challenging problems of modem computational biology.
APA, Harvard, Vancouver, ISO, and other styles
7

Van, Der Linde Byron-Mahieu. "A comparative analysis of the singer’s formant cluster." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019.1/85563.

Full text
Abstract:
Thesis (MMus)-- Stellenbosch University, 2013.
ENGLISH ABSTRACT: It is widely accepted that the singer’s formant cluster (Fs) – perceptual correlates being twang and ring, and pedagogically referred to as head resonance – is the defining trait of a classically trained voice. Research has shown that the spectral energy a singer harnesses in the Fs region can be measured quantitatively using spectral indicators Short-Term Energy Ratio (STER) and Singing Power Ratio (SPR). STER is a modified version of the standard measurement tool Energy Ratio (ER) that repudiates dependency on the Long-Term Average Spectrum (LTAS). Previous studies have shown that professional singers produce more Fs spectral energy when singing in ensemble mode than in solo mode; however for amateur singers, the opposite trend was noticed. Little empirical evidence in this regard is available concerning undergraduate vocal performance majors. This study was aimed at investigating the resonance tendencies of individuals from the latter target group, as evidenced when singing in two performance modes: ensemble and solo. Eight voice students (two per SATB voice part) were selected to participate. Subjects were recorded singing their parts individually, as well as in full ensemble. By mixing the solo recordings together, comparisons of the spectral content could be drawn between the solo and ensemble performance modes. Samples (n=4) were extracted from each piece for spectral analyses. STER and SPR means were highly proportional for both pieces. Results indicate that the singers produce significantly higher levels of spectral energy in the Fs region in ensemble mode than in solo mode for one piece (p<0.05), whereas findings for the other piece were insignificant. The findings of this study could inform the pedagogical approach to voice-training, and provides empirical bases for discussions about voice students’ participation in ensemble ventures.
AFRIKAANSE OPSOMMING: Dit word algemeen aanvaar dat die singer’s formant cluster (Fs) – die perseptuele korrelate is die Engelse “twang” en “ring”, en waarna daar in die pedagogie verwys word as kopresonansie – die bepalende eienskap is van ’n Klassiek-opgeleide stem. Navorsing dui daarop dat die spektrale energie wat ’n sanger in die Fs omgewing inspan kwantitatief gemeet kan word deur die gebruik van Short-Term Energy Ratio (STER) en Singing Power Ratio (SPR) as spektrale aanwysers. STER is ’n gewysigde weergawe van die standaard maatstaf vir energie in die Fs, naamlik Energy Ratio (ER), wat afhanklikheid van die Long-Term Average Spectrum (LTAS) verwerp. Vorige studies het getoon dat professionele sangers meer Fs energie produseer in ensemble konteks as in solo konteks, in teenstelling met amateur sangers waar die teenoorgestelde die norm is. Min empiriese data in hierdie verband is beskikbaar, m.b.t. voorgraadse uitvoerende sangstudente. Hierdie studie is daarop gemik om die tendense in resonansie by individue uit die laasgenoemde groep te ondersoek, soos dit blyk in die twee uitvoerende kontekste: ensemble en solo. Agt sangstudente (twee per SATB stemgroep) is geselekteer om aan die studie deel te neem. Die deelnemers het hul stempartye individueel en in volle ensemble gesing, en is by beide geleenthede opgeneem. Deur die soloopnames te meng, kon vergelykings van die spektrale inhoud gemaak word tussen die solo en ensemble konteks. ’n Steekproef (n=4) is uit elke stuk onttrek vir spektrale analise. Die STER en SPR gemiddeldes was eweredig vir beide stukke. Resultate toon dat die sangers beduidend hoër vlakke van spektrale energie in die Fs omgewing produseer in ensemble konteks as in solo konteks vir een stuk (p<0.05), terwyl die bevindinge vir die tweede stuk nie beduidend was nie. Die bevindinge van hierdie studie kan belangrik wees vir die pedagogiese benadering tot stemopleiding, en lewer empiriese basis vir gesprekke oor die betrokkenheid van sangstudente in die ensemble bedryf.
APA, Harvard, Vancouver, ISO, and other styles
8

Ramirez, Jon. "Analysis of compute cluster nodes with varying memory hierarchy distributions." To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2009. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Cole, Rowena Marie. "Clustering with genetic algorithms." University of Western Australia. Dept. of Computer Science, 1998. http://theses.library.uwa.edu.au/adt-WU2003.0008.

Full text
Abstract:
Clustering is the search for those partitions that reflect the structure of an object set. Traditional clustering algorithms search only a small sub-set of all possible clusterings (the solution space) and consequently, there is no guarantee that the solution found will be optimal. We report here on the application of Genetic Algorithms (GAs) -- stochastic search algorithms touted as effective search methods for large and complex spaces -- to the problem of clustering. GAs which have been made applicable to the problem of clustering (by adapting the representation, fitness function, and developing suitable evolutionary operators) are known as Genetic Clustering Algorithms (GCAs). There are two parts to our investigation of GCAs: first we look at clustering into a given number of clusters. The performance of GCAs on three generated data sets, analysed using 4320 differing combinations of adaptions, establishes their efficacy. Choice of adaptions and parameter settings is data set dependent, but comparison between results using generated and real data sets indicate that performance is consistent for similar data sets with the same number of objects, clusters, attributes, and a similar distribution of objects. Generally, group-number representations are better suited to the clustering problem, as are dynamic scaling, elite selection and high mutation rates. Independent generalised models fitted to the correctness and timing results for each of the generated data sets produced accurate predictions of the performance of GCAs on similar real data sets. While GCAs can be successfully adapted to clustering, and the method produces results as accurate and correct as traditional methods, our findings indicate that, given a criterion based on simple distance metrics, GCAs provide no advantages over traditional methods. Second, we investigate the potential of genetic algorithms for the more general clustering problem, where the number of clusters is unknown. We show that only simple modifications to the adapted GCAs are needed. We have developed a merging operator, which with elite selection, is employed to evolve an initial population with a large number of clusters toward better clusterings. With regards to accuracy and correctness, these GCAs are more successful than optimisation methods such as simulated annealing. However, such GCAs can become trapped in local minima in the same manner as traditional hierarchical methods. Such trapping is characterised by the situation where good (k-1)-clusterings do not result from our merge operator acting on good k-clusterings. A marked improvement in the algorithm is observed with the addition of a local heuristic.
APA, Harvard, Vancouver, ISO, and other styles
10

Cui, Yingjie, and 崔英杰. "A study on privacy-preserving clustering." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B4357225X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Ndebele, Nothando Elizabeth. "Clustering algorithms and their effect on edge preservation in image compression." Thesis, Rhodes University, 2009. http://hdl.handle.net/10962/d1008210.

Full text
Abstract:
Image compression aims to reduce the amount of data that is stored or transmitted for images. One technique that may be used to this end is vector quantization. Vectors may be used to represent images. Vector quantization reduces the number of vectors required for an image by representing a cluster of similar vectors by one typical vector that is part of a set of vectors referred to as the code book. For compression, for each image vector, only the closest codebook vector is stored or transmitted. For reconstruction, the image vectors are again replaced by the the closest codebook vectors. Hence vector quantization is a lossy compression technique and the quality of the reconstructed image depends strongly on the quality of the codebook. The design of the codebook is therefore an important part of the process. In this thesis we examine three clustering algorithms which can be used for codebook design in image compression: c-means (CM), fuzzy c-means (FCM) and learning vector quantization (LVQ). We give a description of these algorithms and their application to codebook design. Edges are an important part of the visual information contained in an image. It is essential therefore to use codebooks which allow an accurate representation of the edges. One of the shortcomings of using vector quantization is poor edge representation. We therefore carry out experiments using these algorithms to compare their edge preserving qualities. We also investigate the combination of these algorithms with classified vector quantization (CVQ) and the replication method (RM). Both these methods have been suggested as methods for improving edge representation. We use a cross validation approach to estimate the mean squared error to measure the performance of each of the algorithms and the edge preserving methods. The results reflect that the edges are less accurately represented than the non - edge areas when using CM, FCM and LVQ. The advantage of using CVQ is that the time taken for code book design is reduced particularly for CM and FCM. RM is found to be effective where the codebook is trained using a set that has larger proportions of edges than the test set.
APA, Harvard, Vancouver, ISO, and other styles
12

Ordońẽz, Carlos. "Mining complex databases using the EM algorithm." Diss., Georgia Institute of Technology, 2000. http://hdl.handle.net/1853/8232.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Saranyan, N. "Prediction based load balancing heuristic for a heterogeneous cluster." Thesis, Indian Institute of Science, 2003. http://hdl.handle.net/2005/95.

Full text
Abstract:
Load balancing has been a topic of interest in both academia and industry, mainly because of the scope for performance enhancement that is available to be exploited in many parallel and distributed processing environments. Among the many approaches that have been used to solve the load balancing problem, we find that only very few use prediction of code execution times. Our reasoning for this is that the field of code prediction is in its infancy. As of this writing, we are not aware of any prediction-based load balancing approach that uses prediction8 of code-execution times, and uses neither the information provided by the user, nor an off-line step that does the prediction, the results of which are then used at run-time. In this context, it is important to note that prior studies have indicated the feasibility of predicting the CPU requirements of general application programs. Our motivation in using prediction-based load balancing is to determine the feasibility of the approach. The reasoning behind that is the following: if prediction-based load balancing does yield good performance, then it may be worthwhile to develop a predictor that can give a rough estimate of the length of the next CPU burst of each process. While high accuracy of the predictor is not essential, the computation overhead of the predictor must be sufficiently' small, so as not to offset the gain of load balancing. As for the system, we assume a set of autonomous computers, that are connected by a fast, shared medium. The individual nodes can vary in the additional hardware and software that may be available in them. Further, we assume that the processes in the workload are sequential. The first step is to fix the parameters for our assumed predictor. Then, an algorithm that takes into account the characteristics of the predictor is proposed. There are many trade-off decisions in the design of the algorithm, including certain steps in which we have relied on trial and error method to find suitable values. The next logical step is to verify the efficiency of the algorithm. To assess its performance, we carry out event driven simulation. We also evaluate the robustness of the algorithm with respect to the characteristics of the predictor. The contribution of the thesis is as follows: It proposes a load-balancing algorithm for a heterogeneous cluster of workstations connected by a fast network. The simulation assumes that the heterogeneity is limited to variability in processor clock rates; but the algorithm can be applied when the nodes have other types of heterogeneity as well. The algorithm uses prediction of CPU burst lengths as its basic input unit. The performance of the algorithm is evaluated through event driven simulation using assumed workload distributions. The results of the simulation show that the algorithm yields a good improvement in response times over the scenario in which no load redistribution is done.
APA, Harvard, Vancouver, ISO, and other styles
14

Yip, Yuk-Lap Kevin, and 葉旭立. "HARP: a practical projected clustering algorithm for mining gene expression data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B29634568.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Luque, N. E. "Cluster dynamics in the Basque region of Spain." Thesis, Coventry University, 2011. http://curve.coventry.ac.uk/open/items/4f4161ca-11db-4d70-9954-aea64f4fbaa4/1.

Full text
Abstract:
Developing and retaining competitive advantage was a major concern for all companies; it fundamentally relied on being aware of the external environment and customer satisfaction. Modifications of the environment conditions and unexpected economic events could cause of a loss of the level of organisational adjustment and subsequent loss in competitiveness, only those organisations able to rapidly adjust to these dynamics would be able to remain. In some instances, companies decided to geographically co-locate seeking economies of scale and benefiting from complementarities. Literature review revealed the strong support that clusters had from Government and Local Authorities, but it also highlighted the limited practical research in the field. The aim of this research was to measure the dynamism of the cluster formed by the geographical concentration of diverse manufacturers within the Mondragon Cooperativa Group in the Basque region of Spain, and compared it to the individual dynamism of these organisations in order to have a better understanding the actual complementarities and synergies of this industrial colocation. Literature review identified dynamic capabilities as the core enablers of organisation when competing in dynamic environments; based on these capabilities, a model was formulated. This model combined with the primary data collected via questionnaire and interviews helped measure the dynamism of the individual cluster members and the cluster as whole as well as provided an insight on the complementarities and synergies of this type of alliance. The findings of the research concluded that the cluster as a whole was more dynamic than the individual members; nevertheless, the model suggested that there were considerable differences in speed among the cluster members. These differences on speed were determined by the size of the company and their performance in dimensions such as marketing, culture and management. The research also suggested that despite of the clear differences in the level of dynamism among cluster members, all companies benefited in some way from being part of the cluster; these benefits were different in nature depending on each specific members.
APA, Harvard, Vancouver, ISO, and other styles
16

Dannenberg, Matthew. "Pattern Recognition in High-Dimensional Data." Scholarship @ Claremont, 2016. https://scholarship.claremont.edu/hmc_theses/76.

Full text
Abstract:
Vast amounts of data are produced all the time. Yet this data does not easily equate to useful information: extracting information from large amounts of high dimensional data is nontrivial. People are simply drowning in data. A recent and growing source of high-dimensional data is hyperspectral imaging. Hyperspectral images allow for massive amounts of spectral information to be contained in a single image. In this thesis, a robust supervised machine learning algorithm is developed to efficiently perform binary object classification on hyperspectral image data by making use of the geometry of Grassmann manifolds. This algorithm can consistently distinguish between a large range of even very similar materials, returning very accurate classification results with very little training data. When distinguishing between dissimilar locations like crop fields and forests, this algorithm consistently classifies more than 95 percent of points correctly. On more similar materials, more than 80 percent of points are classified correctly. This algorithm will allow for very accurate information to be extracted from these large and complicated hyperspectral images.
APA, Harvard, Vancouver, ISO, and other styles
17

Cheng, Lu. "Concentric layout, a new scientific data layout for matrix data set in Hadoop file system." Master's thesis, University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4545.

Full text
Abstract:
The data generated by scientific simulation, sensor, monitor or optical telescope has increased with dramatic speed. In order to analyze the raw data speed and space efficiently, data pre-process operation is needed to achieve better performance in data analysis phase. Current research shows an increasing tread of adopting MapReduce framework for large scale data processing. However, the data access patterns which generally applied to scientific data set are not supported by current MapReduce framework directly. The gap between the requirement from analytics application and the property of MapReduce framework motivates us to provide support for these data access patterns in MapReduce framework. In our work, we studied the data access patterns in matrix files and proposed a new concentric data layout solution to facilitate matrix data access and analysis in MapReduce framework. Concentric data layout is a data layout which maintains the dimensional property in chunk level. Contrary to the continuous data layout which adopted in current Hadoop framework by default, concentric data layout stores the data from the same sub-matrix into one chunk. This matches well with the matrix operations like computation. The concentric data layout preprocesses the data beforehand, and optimizes the afterward run of MapReduce application. The experiments indicate that the concentric data layout improves the overall performance, reduces the execution time by 38% when the file size is 16 GB, also it relieves the data overhead phenomenon and increases the effective data retrieval rate by 32% on average.
ID: 029051151; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Thesis (M.S.)--University of Central Florida, 2010.; Includes bibliographical references (p. 56-58).
M.S.
Masters
Department of Electrical Engineering and Computer Science
Engineering
APA, Harvard, Vancouver, ISO, and other styles
18

Yee, Adam J. "Sharing the love : a generic socket API for Hadoop Mapreduce." Scholarly Commons, 2011. https://scholarlycommons.pacific.edu/uop_etds/772.

Full text
Abstract:
Hadoop is a popular software framework written in Java that performs data-intensive distributed computations on a cluster. It includes Hadoop MapReduce and the Hadoop Distributed File System (HDFS). HDFS has known scalability limitations due to its single NameNode which holds the entire file system namespace in RAM on one computer. Therefore, the NameNode can only store limited amounts of file names depending on the RAM capacity. The solution to furthering scalability is distributing the namespace similar to how file is data divided into chunks and stored across cluster nodes. Hadoop has an abstract file system API which is extended to integrate HDFS, but has also been extended for integrating file systems S3, CloudStore, Ceph and PVFS. File systems Ceph and PVFS already distribute the namespace, while others such as Lustre are making the conversion. Google previously announced in 2009 they have been implementing a Google File System distributed namespace to achieve greater scalability. The Generic Hadoop API is created from Hadoop's abstract file system API. It speaks a simple communication protocol that can integrate any file system which supports TCP sockets. By providing a file system agnostic API, future work with other file systems might provide ways for surpassing Hadoop 's current scalability limitations. Furthermore, the new API eliminates the need for customizing Hadoop's Java implementation, and instead moves the implementation to the file system itself. Thus, developers wishing to integrate their new file system with Hadoop are not responsible for understanding details ofHadoop's internal operation. The API is tested on a homogeneous, four-node cluster with OrangeFS. Initial OrangeFS I/0 throughputs compared to HDFS are 67% ofHDFS' write throughput and 74% percent of HDFS' read throughput. But, compared with an alternate method of integrating with OrangeFS (a POSIX kernel interface), write and read throughput is increased by 23% and 7%, respectively
APA, Harvard, Vancouver, ISO, and other styles
19

Choudhury, Salimur Rashid, and University of Lethbridge Faculty of Arts and Science. "Approximation algorithms for a graph-cut problem with applications to a clustering problem in bioinformatics." Thesis, Lethbridge, Alta. : University of Lethbridge, Deptartment of Mathematics and Computer Science, 2008, 2008. http://hdl.handle.net/10133/774.

Full text
Abstract:
Clusters in protein interaction networks can potentially help identify functional relationships among proteins. We study the clustering problem by modeling it as graph cut problems. Given an edge weighted graph, the goal is to partition the graph into a prescribed number of subsets obeying some capacity constraints, so as to maximize the total weight of the edges that are within a subset. Identification of a dense subset might shed some light on the biological function of all the proteins in the subset. We study integer programming formulations and exhibit large integrality gaps for various formulations. This is indicative of the difficulty in obtaining constant factor approximation algorithms using the primal-dual schema. We propose three approximation algorithms for the problem. We evaluate the algorithms on the database of interacting proteins and on randomly generated graphs. Our experiments show that the algorithms are fast and have good performance ratio in practice.
xiii, 71 leaves : ill. ; 29 cm.
APA, Harvard, Vancouver, ISO, and other styles
20

Chen, Chong. "Acceleration of Computer Based Simulation, Image Processing, and Data Analysis Using Computer Clusters with Heterogeneous Accelerators." University of Dayton / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=dayton148036732102682.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Zhou, Ke. "Extending low-rank matrix factorizations for emerging applications." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50230.

Full text
Abstract:
Low-rank matrix factorizations have become increasingly popular to project high dimensional data into latent spaces with small dimensions in order to obtain better understandings of the data and thus more accurate predictions. In particular, they have been widely applied to important applications such as collaborative filtering and social network analysis. In this thesis, I investigate the applications and extensions of the ideas of the low-rank matrix factorization to solve several practically important problems arise from collaborative filtering and social network analysis. A key challenge in recommendation system research is how to effectively profile new users, a problem generally known as \emph{cold-start} recommendation. In the first part of this work, we extend the low-rank matrix factorization by allowing the latent factors to have more complex structures --- decision trees to solve the problem of cold-start recommendations. In particular, we present \emph{functional matrix factorization} (fMF), a novel cold-start recommendation method that solves the problem of adaptive interview construction based on low-rank matrix factorizations. The second part of this work considers the efficiency problem of making recommendations in the context of large user and item spaces. Specifically, we address the problem through learning binary codes for collaborative filtering, which can be viewed as restricting the latent factors in low-rank matrix factorizations to be binary vectors that represent the binary codes for both users and items. In the third part of this work, we investigate the applications of low-rank matrix factorizations in the context of social network analysis. Specifically, we propose a convex optimization approach to discover the hidden network of social influence with low-rank and sparse structure by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes, emphasizing the mutual-excitation nature of the dynamics of event occurrences. The proposed framework combines the estimation of mutually exciting process and the low-rank matrix factorization in a principled manner. In the fourth part of this work, we estimate the triggering kernels for the Hawkes process. In particular, we focus on estimating the triggering kernels from an infinite dimensional functional space with the Euler Lagrange equation, which can be viewed as applying the idea of low-rank factorizations in the functional space.
APA, Harvard, Vancouver, ISO, and other styles
22

Luo, Jia. "Computer molecular dynamics simulation study of isomerization and melting of small alkali-halide clusters." Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/27674.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Wong, Cheok Meng. "A distributed particle swarm optimization for fuzzy c-means algorithm based on an apache spark platform." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3950604.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Shen, Jingdi. "Regional Lexical Variation in Modern Written Chinese: Analysis and Characterization Using Geo-Tagged Social Media Data." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1531845935585073.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Kejkula, Martin. "Zpracování asociačních pravidel metodou vícekriteriálního shlukování." Doctoral thesis, Vysoká škola ekonomická v Praze, 2002. http://www.nusl.cz/ntk/nusl-77103.

Full text
Abstract:
Association rules mining is one of several ways of knowledge discovery in databases. Paradoxically, data mining itself can produce such great amounts of association rules that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. The goal of this work is to design a new method for association rules post-processing. The method should be software and domain independent. The output of the new method should be structured description of the whole set of discovered association rules. The output should help user to work with discovered rules. The path to reach the goal I used is: to split association rules into clusters. Each cluster should contain rules, which are more similar each other than to rules from another cluster. The output of the method is such cluster definition and description. The main contribution of this Ph.D. thesis is the described new Multicriterial clustering association rules method. Secondary contribution is the discussion of already published association rules post-processing methods. The output of the introduced new method are clusters of rules, which cannot be reached by any of former post-processing methods. According user expectations clusters are more relevant and more effective than any former association rules clustering results. The method is based on two orthogonal clustering of the same set of association rules. One clustering is based on interestingness measures (confidence, support, interest, etc.). Second clustering is inspired by document clustering in information retrieval. The representation of rules in vectors like documents is fontal in this thesis. The thesis is organized as follows. Chapter 2 identify the role of association rules in the KDD (knowledge discovery in databases) process, using KDD methodologies (CRISP-DM, SEMMA, GUHA, RAMSYS). Chapter 3 define association rule and introduce characteristics of association rules (including interestingness measuress). Chapter 4 introduce current association rules post-processing methods. Chapter 5 is the introduction to cluster analysis. Chapter 6 is the description of the new Multicriterial clustering association rules method. Chapter 7 consists of several experiments. Chapter 8 discuss possibilities of usage and development of the new method.
APA, Harvard, Vancouver, ISO, and other styles
26

Lilliehöök, Hampus. "Extraction of word senses from bilingual resources using graph-based semantic mirroring." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-91880.

Full text
Abstract:
In this thesis we retrieve semantic information that exists implicitly in bilingual data. We gather input data by repeatedly applying the semantic mirroring procedure. The data is then represented by vectors in a large vector space. A resource of synonym clusters is then constructed by performing K-means centroid-based clustering on the vectors. We evaluate the result manually, using dictionaries, and against WordNet, and discuss prospects and applications of this method.
I det här arbetet utvinner vi semantisk information som existerar implicit i tvåspråkig data. Vi samlar indata genom att upprepa proceduren semantisk spegling. Datan representeras som vektorer i en stor vektorrymd. Vi bygger sedan en resurs med synonymkluster genom att applicera K-means-algoritmen på vektorerna. Vi granskar resultatet för hand med hjälp av ordböcker, och mot WordNet, och diskuterar möjligheter och tillämpningar för metoden.
APA, Harvard, Vancouver, ISO, and other styles
27

Ngai, Wang-kay, and 倪宏基. "Cluster analysis on uncertain data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2008. http://hub.hku.hk/bib/B4218261X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Ngai, Wang-kay. "Cluster analysis on uncertain data." Click to view the E-thesis via HKUTO, 2008. http://sunzi.lib.hku.hk/hkuto/record/B4218261X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Giordano, Manfredi. "Autonomic Big Data Processing." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14837/.

Full text
Abstract:
Apache Spark è un framework open source per la computazione distribuita su larga scala, caratterizzato da un engine in-memory che permette prestazioni superiori a soluzioni concorrenti nell’elaborazione di dati a riposo (batch) o in movimento (streaming). In questo lavoro presenteremo alcune tecniche progettate e implementate per migliorare l’elasticità e l’adattabilità del framework rispetto a modifiche dinamiche nell’ambiente di esecuzione o nel workload. Lo scopo primario di tali tecniche è di permettere ad applicazioni concorrenti di condividere le risorse fisiche disponibili nell’infrastruttura cluster sottostante in modo efficiente. Il contesto nel quale le applicazioni distribuite vengono eseguite difficilmente può essere considerato statico: le componenti hardware possono fallire, i processi possono interrompersi, gli utenti possono allocare risorse aggiuntive in modo imprevedibile nel tentativo di accelerare la computazione o di allegerire il carico di lavoro. Infine, non soltanto le risorse fisiche ma anche i dati in input possono variare di dimensione e complessità durante l’esecuzione, così che sia dati sia risorse non possano essere considerati statici. Una configurazione immutabile del cluster non riuscirà a ottenere la migliore efficienza possibile per tutti i differenti carichi di lavoro. Ne consegue che un framework per il calcolo distribuito che sia "consapevole" delle modifiche ambientali e delle modifiche al workload e che sia in grado di adattarsi a esse puo risultare piu performante di un framework che permetta unicamente configurazioni statiche. Gli esperimenti da noi compiuti con applicazioni Big Data altamente parallelizzabili mostrano come il costo della soluzione proposta sia minimo e come la nostra version di Spark più dinamica e adattiva possa portare a benefici in termini di flessibilità, scalabilità ed efficienza.
APA, Harvard, Vancouver, ISO, and other styles
30

Busse, Ludwig M. Orbanz Peter Buhmann Joachim M. Buhmann Joachim M. Buhmann Joachim M. "Cluster analysis of heterogeneous rank data." Zurich : ETH Department of Computer Science, Institute of Computational Sciences, 2007. http://e-collection.ethbib.ethz.ch/show?type=dipl&nr=350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Molin, Felix. "Cluster analysis of European banking data." Thesis, KTH, Matematisk statistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219597.

Full text
Abstract:
Credit institutions constitute a central part of life as it is today and has been doing so for a long time. A fault within the banking system can cause a tremendous amount of damage to individuals as well as countries. A recent and memorable fault is the global financial crisis 2007-2009. It has affected millions of people in different ways ever since it struck. What caused it is a complex issue which cannot be answered easily. But what has been done to prevent something similar to occur once again? How has the business models of the credit institutions changed since the crisis? Cluster analysis is used in this thesis to address these questions. Banking-data were processed with Calinski-Harabasz Criterion and Ward's method and this resulted in two clusters being found. A cluster is a collection of observations that have similar characteristics or business model in this case. The business models that the clusters represents are universal banking with a retail focus and universal banking with a wholesale focus. These business models have been analyzed over time (2007-2016), which revealed that the credit institutions have developed in a healthy direction. Thus, credit institutions were more financially reliable in 2016 compared to 2007. According to trends in the data this development is likely to continue.
Kreditinstituten utgör en central del av livet som det ser ut idag och har gjort det under en lång tid. Ett fel inom banksystemet kan orsaka enorma skador för individer likväl som länder. Ett nutida och minnesvärt fel är den globala finanskrisen 2007-2009. Den har påverkat millioner människor på olika vis ända sedan den slog till. Vad som orsakade den är en komplex fråga som inte kan besvaras med lätthet. Men vad har gjorts för att förebygga att något liknande händer igen? Hur har affärsmodellerna för kreditinstituten ändrats sedan krisen? Klusteranalys används i denna rapport för att adressera dessa frågor. Bankdata processerades med Calinski-Harabasz Kriteriet and Wards metod och detta resulterade i att två kluster hittades. Ett kluster är en samling observationer med liknande karakteristik eller affärsmodell i detta fall. De affärsmodeller som klustrena representerar är universella banker med retail fokus samt universella banker med wholessale fokus. Dessa affärsmodeller har analyserats över tid, vilket har avslöjat att kreditinstituten har utvecklats i en hälsosam riktning. Kreditinstituten var mer finansiellt pålitliga 2016 jämfört med 2007. Enligt trender i datan så är det troligt att denna utveckling forsätter.
APA, Harvard, Vancouver, ISO, and other styles
32

Yeung, Ka Yee. "Cluster analysis of gene expression data /." Thesis, Connect to this title online; UW restricted, 2001. http://hdl.handle.net/1773/6986.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Irick, Nancy. "Post Processing Data Analysis." International Foundation for Telemetering, 2009. http://hdl.handle.net/10150/606091.

Full text
Abstract:
ITC/USA 2009 Conference Proceedings / The Forty-Fifth Annual International Telemetering Conference and Technical Exhibition / October 26-29, 2009 / Riviera Hotel & Convention Center, Las Vegas, Nevada
Once the test is complete, the job of the Data Analyst has begun. Files from the various acquisition systems are collected. It is the job of the analyst to put together these files in a readable format so the success or failure of the test can be attained. This paper will discuss the process of breaking down these files, comparing data from different systems, and methods of presenting the data.
APA, Harvard, Vancouver, ISO, and other styles
34

Єфіменко, Тетяна Михайлівна, Татьяна Михайловна Ефименко, Tetiana Mykhailivna Yefimenko, Олена Владиславівна Коробченко, Елена Владиславовна Коробченко, and Olena Vladyslavivna Korobchenko. "Informational Extreme Cluster Analysis of Input Data." Thesis, Sumy State University, 2016. http://essuir.sumdu.edu.ua/handle/123456789/47076.

Full text
Abstract:
The categorical model and decision support system learning algorithm are considered in the article. Proposed algorithm allows to create decision support system, which is functioning in a clusteranalysis state. Synthesis of the decision support system is based on maximization of informational system ability due to making additional information restrictions in the learning process.
APA, Harvard, Vancouver, ISO, and other styles
35

Hill, Evelyn June. "Applying statistical and syntactic pattern recognition techniques to the detection of fish in digital images." University of Western Australia. School of Mathematics and Statistics, 2004. http://theses.library.uwa.edu.au/adt-WU2004.0070.

Full text
Abstract:
This study is an attempt to simulate aspects of human visual perception by automating the detection of specific types of objects in digital images. The success of the methods attempted here was measured by how well results of experiments corresponded to what a typical human’s assessment of the data might be. The subject of the study was images of live fish taken underwater by digital video or digital still cameras. It is desirable to be able to automate the processing of such data for efficient stock assessment for fisheries management. In this study some well known statistical pattern classification techniques were tested and new syntactical/ structural pattern recognition techniques were developed. For testing of statistical pattern classification, the pixels belonging to fish were separated from the background pixels and the EM algorithm for Gaussian mixture models was used to locate clusters of pixels. The means and the covariance matrices for the components of the model were used to indicate the location, size and shape of the clusters. Because the number of components in the mixture is unknown, the EM algorithm has to be run a number of times with different numbers of components and then the best model chosen using a model selection criterion. The AIC (Akaike Information Criterion) and the MDL (Minimum Description Length) were tested.The MDL was found to estimate the numbers of clusters of pixels more accurately than the AIC, which tended to overestimate cluster numbers. In order to reduce problems caused by initialisation of the EM algorithm (i.e. starting positions of mixtures and number of mixtures), the Dynamic Cluster Finding algorithm (DCF) was developed (based on the Dog-Rabbit strategy). This algorithm can produce an estimate of the locations and numbers of clusters of pixels. The Dog-Rabbit strategy is based on early studies of learning behaviour in neurons. The main difference between Dog-Rabbit and DCF is that DCF is based on a toroidal topology which removes the tendency of cluster locators to migrate to the centre of mass of the data set and miss clusters near the edges of the image. In the second approach to the problem, data was extracted from the image using an edge detector. The edges from a reference object were compared with the edges from a new image to determine if the object occurred in the new image. In order to compare edges, the edge pixels were first assembled into curves using an UpWrite procedure; then the curves were smoothed by fitting parametric cubic polynomials. Finally the curves were converted to arrays of numbers which represented the signed curvature of the curves at regular intervals. Sets of curves from different images can be compared by comparing the arrays of signed curvature values, as well as the relative orientations and locations of the curves. Discrepancy values were calculated to indicate how well curves and sets of curves matched the reference object. The total length of all matched curves was used to indicate what fraction of the reference object was found in the new image. The curve matching procedure gave results which corresponded well with what a human being being might observe.
APA, Harvard, Vancouver, ISO, and other styles
36

Jacob, Aju. "Distributed configuration management for reconfigurable cluster computing." [Gainesville, Fla.] : University of Florida, 2004. http://purl.fcla.edu/fcla/etd/UFE0007181.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Takahashi, Atsushi. "Hierarchical Cluster Analysis of Dense GNSS Data and Interpretation of Cluster Characteristics." Kyoto University, 2019. http://hdl.handle.net/2433/244510.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Springuel, R. Padraic. "Applying Cluster Analysis to Physics Education Research Data." Fogler Library, University of Maine, 2010. http://www.library.umaine.edu/theses/pdf/SpringuelRP2010.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Hegazy, Yasser Ali. "Delineating geostratigraphy by cluster analysis of piezocone data." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/20506.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Li, Hao. "Feature cluster selection for high-dimensional data analysis." Diss., Online access via UMI:, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
41

Gog, Ionel Corneliu. "Flexible and efficient computation in large data centres." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/271804.

Full text
Abstract:
Increasingly, online computer applications rely on large-scale data analyses to offer personalised and improved products. These large-scale analyses are performed on distributed data processing execution engines that run on thousands of networked machines housed within an individual data centre. These execution engines provide, to the programmer, the illusion of running data analysis workflows on a single machine, and offer programming interfaces that shield developers from the intricacies of implementing parallel, fault-tolerant computations. Many such execution engines exist, but they embed assumptions about the computations they execute, or only target certain types of computations. Understanding these assumptions involves substantial study and experimentation. Thus, developers find it difficult to determine which execution engine is best, and even if they did, they become “locked in” because engineering effort is required to port workflows. In this dissertation, I first argue that in order to execute data analysis computations efficiently, and to flexibly choose the best engines, the way we specify data analysis computations should be decoupled from the execution engines that run the computations. I propose an architecture for decoupling data processing, together with Musketeer, my proof-of-concept implementation of this architecture. In Musketeer, developers express data analysis computations using their preferred programming interface. These are translated into a common intermediate representation from which code is generated and executed on the most appropriate execution engine. I show that Musketeer can be used to write data analysis computations directly, and these can execute on many execution engines because Musketeer automatically generates code that is competitive with optimised hand-written implementations. The diverse execution engines cause different workflow types to coexist within a data centre, opening up both opportunities for sharing and potential pitfalls for co-location interference. However, in practice, workflows are either placed by high-quality schedulers that avoid co-location interference, but choose placements slowly, or schedulers that choose placements quickly, but with unpredictable workflow run time due to co-location interference. In this dissertation, I show that schedulers can choose high-quality placements with low latency. I develop several techniques to improve Firmament, a high-quality min-cost flow-based scheduler, to choose placements quickly in large data centres. Finally, I demonstrate that Firmament chooses placements at least as good as other sophisticated schedulers, but at the speeds associated with simple schedulers. These contributions enable more efficient and effective use of data centres for large-scale computation than current solutions.
APA, Harvard, Vancouver, ISO, and other styles
42

Ku, Yuk-chiu, and 古玉翠. "Partitioning HOPD program for fast execution on the HKU UNIX workstation cluster." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31221026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Soon, Shih Chung. "On detection of extreme data points in cluster analysis." Connect to resource, 1987. http://rave.ohiolink.edu/etdc/view.cgi?acc%5Fnum=osu1262886219.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Adams, Daniel Alan. "Optimal Load Balancing in a Beowulf Cluster." Link to electronic thesis, 2005. http://www.wpi.edu/Pubs/ETD/Available/etd-050205-135758/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Galicia, Auyón Jorge Armando. "Revisiting Data Partitioning for Scalable RDF Graph Processing Combining Graph Exploration and Fragmentation for RDF Processing Query Optimization for Large Scale Clustered RDF Data RDFPart- Suite: Bridging Physical and Logical RDF Partitioning. Reverse Partitioning for SPARQL Queries: Principles and Performance Analysis. ShouldWe Be Afraid of Querying Billions of Triples in a Graph-Based Centralized System? EXGRAF: Exploration et Fragmentation de Graphes au Service du Traitement Scalable de Requˆetes RDF." Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2021. http://www.theses.fr/2021ESMA0001.

Full text
Abstract:
Le Resource Description Framework (RDF) et SPARQL sont des standards très populaires basés sur des graphes initialement conçus pour représenter et interroger des informations sur le Web. La flexibilité offerte par RDF a motivé son utilisation dans d'autres domaines. Aujourd'hui les jeux de données RDF sont d'excellentes sources d'information. Ils rassemblent des milliards de triplets dans des Knowledge Graphs qui doivent être stockés et exploités efficacement. La première génération de systèmes RDF a été construite sur des bases de données relationnelles traditionnelles. Malheureusement, les performances de ces systèmes se dégradent rapidement car le modèle relationnel ne convient pas au traitement des données RDF intrinsèquement représentées sous forme de graphe. Les systèmes RDF natifs et distribués cherchent à surmonter cette limitation. Les premiers utilisent principalement l’indexation comme stratégie d'optimisation pour accélérer les requêtes. Les deuxièmes recourent au partitionnement des données. Dans le modèle relationnel, la représentation logique de la base de données est cruciale pour concevoir le partitionnement. La couche logique définissant le schéma explicite de la base de données offre un certain confort aux concepteurs. Cette couche leur permet de choisir manuellement ou automatiquement, via des assistants automatiques, les tables et les attributs à partitionner. Aussi, elle préserve les concepts fondamentaux sur le partitionnement qui restent constants quel que soit le système de gestion de base de données. Ce schéma de conception n'est plus valide pour les bases de données RDF car le modèle RDF n'applique pas explicitement un schéma aux données. Ainsi, la couche logique est inexistante et le partitionnement des données dépend fortement des implémentations physiques des triplets sur le disque. Cette situation contribue à avoir des logiques de partitionnement différentes selon le système cible, ce qui est assez différent du point de vue du modèle relationnel. Dans cette thèse, nous promouvons l'idée d'effectuer le partitionnement de données au niveau logique dans les bases de données RDF. Ainsi, nous traitons d'abord le graphe de données RDF pour prendre en charge le partitionnement basé sur des entités logiques. Puis, nous proposons un framework pour effectuer les méthodes de partitionnement. Ce framework s'accompagne de procédures d'allocation et de distribution des données. Notre framework a été incorporé dans un système de traitement des données RDF centralisé (RDF_QDAG) et un système distribué (gStoreD). Nous avons mené plusieurs expériences qui ont confirmé la faisabilité de l'intégration de notre framework aux systèmes existants en améliorant leurs performances pour certaines requêtes. Enfin, nous concevons un ensemble d'outils de gestion du partitionnement de données RDF dont un langage de définition de données (DDL) et un assistant automatique de partitionnement
The Resource Description Framework (RDF) and SPARQL are very popular graph-based standards initially designed to represent and query information on the Web. The flexibility offered by RDF motivated its use in other domains and today RDF datasets are great information sources. They gather billions of triples in Knowledge Graphs that must be stored and efficiently exploited. The first generation of RDF systems was built on top of traditional relational databases. Unfortunately, the performance in these systems degrades rapidly as the relational model is not suitable for handling RDF data inherently represented as a graph. Native and distributed RDF systems seek to overcome this limitation. The former mainly use indexing as an optimization strategy to speed up queries. Distributed and parallel RDF systems resorts to data partitioning. The logical representation of the database is crucial to design data partitions in the relational model. The logical layer defining the explicit schema of the database provides a degree of comfort to database designers. It lets them choose manually or automatically (through advisors) the tables and attributes to be partitioned. Besides, it allows the partitioning core concepts to remain constant regardless of the database management system. This design scheme is no longer valid for RDF databases. Essentially, because the RDF model does not explicitly enforce a schema since RDF data is mostly implicitly structured. Thus, the logical layer is inexistent and data partitioning depends strongly on the physical implementations of the triples on disk. This situation contributes to have different partitioning logics depending on the target system, which is quite different from the relational model’s perspective. In this thesis, we promote the novel idea of performing data partitioning at the logical level in RDF databases. Thereby, we first process the RDF data graph to support logical entity-based partitioning. After this preparation, we present a partitioning framework built upon these logical structures. This framework is accompanied by data fragmentation, allocation, and distribution procedures. This framework was incorporated to a centralized (RDF_QDAG) and a distributed (gStoreD) triple store. We conducted several experiments that confirmed the feasibility of integrating our framework to existent systems improving their performances for certain queries. Finally, we design a set of RDF data partitioning management tools including a data definition language (DDL) and an automatic partitioning wizard
APA, Harvard, Vancouver, ISO, and other styles
46

Windridge, David. "A fluctuation analysis for optical cluster galaxies." Thesis, University of Bristol, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.302173.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Bolton, Richard John. "Multivariate analysis of multiproduct market research data." Thesis, University of Exeter, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.302542.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Wang, Dali. "Adaptive Double Self-Organizing Map for Clustering Gene Expression Data." Fogler Library, University of Maine, 2003. http://www.library.umaine.edu/theses/pdf/WangD2003.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

McClelland, Robyn L. "Regression based variable clustering for data reduction /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/9611.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Parker, Brandon S. "CLUE: A Cluster Evaluation Tool." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5444/.

Full text
Abstract:
Modern high performance computing is dependent on parallel processing systems. Most current benchmarks reveal only the high level computational throughput metrics, which may be sufficient for single processor systems, but can lead to a misrepresentation of true system capability for parallel systems. A new benchmark is therefore proposed. CLUE (Cluster Evaluator) uses a cellular automata algorithm to evaluate the scalability of parallel processing machines. The benchmark also uses algorithmic variations to evaluate individual system components' impact on the overall serial fraction and efficiency. CLUE is not a replacement for other performance-centric benchmarks, but rather shows the scalability of a system and provides metrics to reveal where one can improve overall performance. CLUE is a new benchmark which demonstrates a better comparison among different parallel systems than existing benchmarks and can diagnose where a particular parallel system can be optimized.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography