Dissertations / Theses: 'Tree data'

1

MacKinnon, Richard Kyle. "Seeing the forest for the trees: tree-based uncertain frequent pattern mining." Springer International Publishing, 2014. http://hdl.handle.net/1993/31059.

Full text

Abstract:

Many frequent pattern mining algorithms operate on precise data, where each data point is an exact accounting of a phenomena (e.g., I have exactly two sisters). Alas, reasoning this way is a simplification for many real world observations. Measurements, predictions, environmental factors, human error, &ct. all introduce a degree of uncertainty into the mix. Tree-based frequent pattern mining algorithms such as FP-growth are particularly efficient due to their compact in-memory representations of the input database, but their uncertain extensions can require many more tree nodes. I propose new algorithms with tightened upper bounds to expected support, Tube-S and Tube-P, which mine frequent patterns from uncertain data. Extensive experimentation and analysis on datasets with different probability distributions are undertaken that show the tightness of my bounds in different situations.
February 2016

APA, Harvard, Vancouver, ISO, and other styles

2

Ahmad, Amir. "Data Transformation for Decision Tree Ensembles." Thesis, University of Manchester, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.508528.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Da, San Martino Giovanni <1979&gt. "Kernel Methods for Tree Structured Data." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2009. http://amsdottorato.unibo.it/1400/.

Full text

Abstract:

Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

APA, Harvard, Vancouver, ISO, and other styles

4

Liu, Dan. "Tree-based Models for Longitudinal Data." Bowling Green State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1399972118.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Csank, Adam Z. "Research Communication: An International Tree-Ring Isotope Data Bank- A Proposed Repository For Tree-Ring Isotopic Data." Tree-Ring Society, 2009. http://hdl.handle.net/10150/622606.

Full text

Abstract:

The International Tree-Ring Data Bank (ITRDB) is an invaluable resource, providing access to a massive and growing cache of tree-ring data. Oxygen, carbon, nitrogen and hydrogen isotope treering studies, which have provided valuable climatic and ecological information, have proliferated for decades so an ITRDB expansion to include isotopic data would likewise benefit the scientific community. An international tree-ring isotope databank (ITRIDB) would: (1) allow development of transfer functions from extended isotopic data sets, (2) provide abundant tree-ring isotopic data for meta-analysis, and (3) encourage isotopic network studies. A Europe network already exists, but the international data bank proposed here would constitute a de facto global network. Associated information to be incorporated into the database includes not only the customary ITRDB entries, but also elements peculiar to isotope chronologies. As with the current ITRDB, submission of data would be voluntary and as such it will be crucial to have the support of the tree-ring isotope community to contribute existing and forthcoming isotope series. The plan is to institute this isotope database in 2010, administered by the National Climatic Data Center.

APA, Harvard, Vancouver, ISO, and other styles

6

King, Stuart. "Optimizations and applications of Trie-Tree based frequent pattern mining." Diss., Connect to online resource - MSU authorized users, 2006.

Find full text

Abstract:

Thesis (M. S.)--Michigan State University. Dept. of Computer Science and Engineering, 2006.
Title from PDF t.p. (viewed on June 19, 2009) Includes bibliographical references (p. 79-80). Also issued in print.

APA, Harvard, Vancouver, ISO, and other styles

7

Rizo, David. "Symbolic music comparison with tree data structures." Doctoral thesis, Universidad de Alicante, 2010. http://hdl.handle.net/10045/18331.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Evans, Margaret E. K., Donald A. Falk, Alexis Arizpe, Tyson L. Swetnam, Flurin Babst, and Kent E. Holsinger. "Fusing tree-ring and forest inventory data to infer influences on tree growth." WILEY, 2017. http://hdl.handle.net/10150/625361.

Full text

Abstract:

Better understanding and prediction of tree growth is important because of the many ecosystem services provided by forests and the uncertainty surrounding how forests will respond to anthropogenic climate change. With the ultimate goal of improving models of forest dynamics, here we construct a statistical model that combines complementary data sources, tree-ring and forest inventory data. A Bayesian hierarchical model was used to gain inference on the effects of many factors on tree growth-individual tree size, climate, biophysical conditions, stand-level competitive environment, tree-level canopy status, and forest management treatments-using both diameter at breast height (dbh) and tree-ring data. The model consists of two multiple regression models, one each for the two data sources, linked via a constant of proportionality between coefficients that are found in parallel in the two regressions. This model was applied to a data set of similar to 130 increment cores and similar to 500 repeat measurements of dbh at a single site in the Jemez Mountains of north-central New Mexico, USA. The tree-ring data serve as the only source of information on how annual growth responds to climate variation, whereas both data types inform non-climatic effects on growth. Inferences from the model included positive effects on growth of seasonal precipitation, wetness index, and height ratio, and negative effects of dbh, seasonal temperature, southerly aspect and radiation, and plot basal area. Climatic effects inferred by the model were confirmed by a den-droclimatic analysis. Combining the two data sources substantially reduced uncertainty about non-climate fixed effects on radial increments. This demonstrates that forest inventory data measured on many trees, combined with tree-ring data developed for a small number of trees, can be used to quantify and parse multiple influences on absolute tree growth. We highlight the kinds of research questions that can be addressed by combining the high-resolution information on climate effects contained in tree rings with the rich tree-and stand-level information found in forest inventories, including projection of tree growth under future climate scenarios, carbon accounting, and investigation of management actions aimed at increasing forest resilience.

APA, Harvard, Vancouver, ISO, and other styles

9

Faustino, Bruno Filipe Fernandes Simões Salgueiro. "Implementation for spatial data of the shared nearest neighbour with metric data structures." Master's thesis, Faculdade de Ciências e Tecnologia, 2012. http://hdl.handle.net/10362/8489.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Alizadeh, Khameneh Mohammad Amin. "Tree Detection and Species Identification using LiDAR Data." Thesis, KTH, Geodesi och geoinformatik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-119269.

Full text

Abstract:

The importance of single-tree-based information for forest management and related industries in countries like Sweden, which is covered in approximately 65% by forest, is the motivation for developing algorithms for tree detection and species identification in this study. Most of the previous studies in this field are carried out based on aerial and spectral images and less attention has been paid on detecting trees and identifying their species using laser points and clustering methods. In the first part of this study, two main approaches of clustering (hierarchical and K-means) are compared qualitatively in detecting 3-D ALS points that pertain to individual tree clusters. Further tests are performed on test sites using the supervised k-means algorithm in which the initial clustering points are defined as seed points. These points, which represent the top point of each tree are detected from the cross section analysis of the test area. Comparing those three methods (hierarchical, ordinary K-means and supervised K-means), the supervised K-means approach shows the best result for clustering single tree points. An average accuracy of 90% is achieved in detecting trees. Comparing the result of the thesis algorithms with results from the DPM software, developed by the Visimind Company for analysing LiDAR data, shows more than 85% match in detecting trees. Identification of trees is the second issue of this thesis work. For this analysis, 118 trees are extracted as reference trees with three species of spruce, pine and birch, which are the dominating species in Swedish forests. Totally six methods, including best fitted 3-D shapes (cone, sphere and cylinder) based on least squares method, point density, hull ratio and slope changes of tree outer surface are developed for identifying those species. The methods are applied on all extracted reference trees individually. For aggregating the results of all those methods, a fuzzy logic system is used because of its good reputation in combining fuzzy sets with no distinct boundaries. The best-obtained model from the fuzzy system provides 73%, 87% and 71% accuracies in identifying the birch, spruce and pine trees, respectively. The overall obtained accuracy in species categorization of trees is 77%, and this percentage is increased dealing with only coniferous and deciduous types classification. Classifying spruce and pine as coniferous versus birch as deciduous species, yielded to 84% accuracy.

APA, Harvard, Vancouver, ISO, and other styles

11

Tsang, Pui-kwan Smith, and 曾沛坤. "Efficient decision tree building algorithms for uncertain data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2008. http://hub.hku.hk/bib/B41290719.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Tsang, Pui-kwan Smith. "Efficient decision tree building algorithms for uncertain data." Click to view the E-thesis via HKUTO, 2008. http://sunzi.lib.hku.hk/hkuto/record/B41290719.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Gupta, Suraj. "Metagenomic Data Analysis Using Extremely Randomized Tree Algorithm." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/96025.

Full text

Abstract:

Many antibiotic resistance genes (ARGs) conferring resistance to a broad range of antibiotics have often been detected in aquatic environments such as untreated and treated wastewater, river and surface water. ARG proliferation in the aquatic environment could depend upon various factors such as geospatial variations, the type of aquatic body, and the type of wastewater (untreated or treated) discharged into these aquatic environments. Likewise, the strong interconnectivity of aquatic systems may accelerate the spread of ARGs through them. Hence a comparative and a holistic study of different aquatic environments is required to appropriately comprehend the problem of antibiotic resistance. Many studies approach this issue using molecular techniques such as metagenomic sequencing and metagenomic data analysis. Such analyses compare the broad spectrum of ARGs in water and wastewater samples, but these studies use comparisons which are limited to similarity/dissimilarity analyses. However, in such analyses, the discriminatory ARGs (associated ARGs driving such similarity/ dissimilarity measures) may not be identified. Consequentially, the reason which drives the dissimilarities among the samples would not be identified and the reason for antibiotic resistance proliferation may not be clearly understood. In this study, an effective methodology, using Extremely Randomized Trees (ET) Algorithm, was formulated and demonstrated to capture such ARG variations and identify discriminatory ARGs among environmentally derived metagenomes. In this study, data were grouped by: geographic location (to understand the spread of ARGs globally), untreated vs. treated wastewater (to see the effectiveness of WWTPs in removing ARGs), and different aquatic habitats (to understand the impact and spread within aquatic habitats). It was observed that there were certain ARGs which were specific to wastewater samples from certain locations suggesting that site-specific factors can have a certain effect in shaping ARG profiles. Comparing untreated and treated wastewater samples from different WWTPs revealed that biological treatments have a definite impact on shaping the ARG profile. While there were several ARGs which got removed after the treatment, there were some ARGs which showed an increase in relative abundance irrespective of location and treatment plant specific variables. On comparing different aquatic environments, the algorithm identified ARGs which were specific to certain environments. The algorithm captured certain ARGs which were specific to hospital discharges when compared with other aquatic environments. It was determined that the proposed method was efficient in identifying the discriminatory ARGs which could classify the samples according to their groups. Further, it was also effective in capturing low-level variations which generally get over-shadowed in the analysis due to highly abundant genes. The results of this study suggest that the proposed method is an effective method for comprehensive analyses and can provide valuable information to better understand antibiotic resistance.
MS

APA, Harvard, Vancouver, ISO, and other styles

14

Igboamalu, Frank Nonso. "Decision tree classifiers for incident call data sets." Master's thesis, University of Cape Town, 2017. http://hdl.handle.net/11427/27076.

Full text

Abstract:

Information technology (IT) has become one of the key technologies for economic and social development in any organization. Therefore the management of Information technology incidents, and particularly in the area of resolving the problem very fast, is of concern to Information technology managers. Delays can result when incorrect subjects are assigned to Information technology incident calls: because the person sent to remedy the problem has the wrong expertise or has not brought with them the software or hardware they need to help that user. In the case study used for this work, there are no management checks in place to verify the assigning of incident description subjects. This research aims to develop a method that will tackle the problem of wrongly assigned subjects for incident descriptions. In particular, this study explores the Information technology incident calls database of an oil and gas company as a case study. The approach was to explore the Information technology incident descriptions and their assigned subjects; thereafter the correctly-assigned records were used for training decision tree classification algorithms using Waikato Environment for Knowledge Analysis (WEKA) software. Finally, the records incorrectly assigned a subject by human operators were used for testing. The J48 algorithm gave the best performance and accuracy, and was able to correctly assign subjects to 81% of the records wrongly classified by human operators.

APA, Harvard, Vancouver, ISO, and other styles

15

Chu, Chung Cheung. "Tree encoding of speech signals at low bit rates." Thesis, McGill University, 1986. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=65459.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Flöter, André. "Analyzing biological expression data based on decision tree induction." Phd thesis, Universität Potsdam, 2005. http://opus.kobv.de/ubp/volltexte/2006/641/.

Full text

Abstract:

Modern biological analysis techniques supply scientists with various forms of data. One category of such data are the so called "expression data". These data indicate the quantities of biochemical compounds present in tissue samples.

Recently, expression data can be generated at a high speed. This leads in turn to amounts of data no longer analysable by classical statistical techniques. Systems biology is the new field that focuses on the modelling of this information.

At present, various methods are used for this purpose. One superordinate class of these methods is machine learning. Methods of this kind had, until recently, predominantly been used for classification and prediction tasks. This neglected a powerful secondary benefit: the ability to induce interpretable models.

Obtaining such models from data has become a key issue within Systems biology. Numerous approaches have been proposed and intensively discussed. This thesis focuses on the examination and exploitation of one basic technique: decision trees.

The concept of comparing sets of decision trees is developed. This method offers the possibility of identifying significant thresholds in continuous or discrete valued attributes through their corresponding set of decision trees. Finding significant thresholds in attributes is a means of identifying states in living organisms. Knowing about states is an invaluable clue to the understanding of dynamic processes in organisms. Applied to metabolite concentration data, the proposed method was able to identify states which were not found with conventional techniques for threshold extraction.

A second approach exploits the structure of sets of decision trees for the discovery of combinatorial dependencies between attributes. Previous work on this issue has focused either on expensive computational methods or the interpretation of single decision trees a very limited exploitation of the data. This has led to incomplete or unstable results. That is why a new method is developed that uses sets of decision trees to overcome these limitations.

Both the introduced methods are available as software tools. They can be applied consecutively or separately. That way they make up a package of analytical tools that usefully supplement existing methods.

By means of these tools, the newly introduced methods were able to confirm existing knowledge and to suggest interesting and new relationships between metabolites.

Neuere biologische Analysetechniken liefern Forschern verschiedenste Arten von Daten. Eine Art dieser Daten sind die so genannten "Expressionsdaten". Sie geben die Konzentrationen biochemischer Inhaltsstoffe in Gewebeproben an.

Neuerdings können Expressionsdaten sehr schnell erzeugt werden. Das führt wiederum zu so großen Datenmengen, dass sie nicht mehr mit klassischen statistischen Verfahren analysiert werden können. "System biology" ist eine neue Disziplin, die sich mit der Modellierung solcher Information befasst.

Zur Zeit werden dazu verschiedenste Methoden benutzt. Eine Superklasse dieser Methoden ist das maschinelle Lernen. Dieses wurde bis vor kurzem ausschließlich zum Klassifizieren und zum Vorhersagen genutzt. Dabei wurde eine wichtige zweite Eigenschaft vernachlässigt, nämlich die Möglichkeit zum Erlernen von interpretierbaren Modellen.

Die Erstellung solcher Modelle hat mittlerweile eine Schlüsselrolle in der "Systems biology" erlangt. Es sind bereits zahlreiche Methoden dazu vorgeschlagen und diskutiert worden. Die vorliegende Arbeit befasst sich mit der Untersuchung und Nutzung einer ganz grundlegenden Technik: den Entscheidungsbäumen.

Zunächst wird ein Konzept zum Vergleich von Baummengen entwickelt, welches das Erkennen bedeutsamer Schwellwerte in reellwertigen Daten anhand ihrer zugehörigen Entscheidungswälder ermöglicht. Das Erkennen solcher Schwellwerte dient dem Verständnis von dynamischen Abläufen in lebenden Organismen. Bei der Anwendung dieser Technik auf metabolische Konzentrationsdaten wurden bereits Zustände erkannt, die nicht mit herkömmlichen Techniken entdeckt werden konnten.

Ein zweiter Ansatz befasst sich mit der Auswertung der Struktur von Entscheidungswäldern zur Entdeckung von kombinatorischen Abhängigkeiten zwischen Attributen. Bisherige Arbeiten hierzu befassten sich vornehmlich mit rechenintensiven Verfahren oder mit einzelnen Entscheidungsbäumen, eine sehr eingeschränkte Ausbeutung der Daten. Das führte dann entweder zu unvollständigen oder instabilen Ergebnissen. Darum wird hier eine Methode entwickelt, die Mengen von Entscheidungsbäumen nutzt, um diese Beschränkungen zu überwinden.

Beide vorgestellten Verfahren gibt es als Werkzeuge für den Computer, die entweder hintereinander oder einzeln verwendet werden können. Auf diese Weise stellen sie eine sinnvolle Ergänzung zu vorhandenen Analyswerkzeugen dar.

Mit Hilfe der bereitgestellten Software war es möglich, bekanntes Wissen zu bestätigen und interessante neue Zusammenhänge im Stoffwechsel von Pflanzen aufzuzeigen.

APA, Harvard, Vancouver, ISO, and other styles

17

Koneri, Kiran Kumar. "Implementation of Collection Tree Protocol over WirelessHART Data-Link." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH, Data- och elektroteknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-15665.

Full text

Abstract:

Wireless Sensor Networks (WSNs) are ad-hoc wireless networks for small form-factor embedded nodes with limited memory, processing and energy resources. Certain applications, like industrial automation and real-time process monitoring requires time synchronized reliable network protocol. Current work for WSNs provides either time synchronized with low reliability (WirelessHART) or reliable network without time synchronization (Collection Tree Protocol). The Collection Tree Protocol (CTP) provides the reliability from 94.7% to 99.9% for CSMA-CA based MAC layer. This paper addresses channel hopping, a class of frequency diverse communication protocol in which subsequent packets are sent over different frequency channels. Channel hopping combats external interference and persistent multipath fading, two of the main causes of failure along a communication link. Channel hopping technique leads to a high reliable and efficient protocol which is specified by HART Communication Foundation and named as WirelessHART. WirelessHART Data-Link layer designed based on TDMA and CSMA-CA mechanism. By implementing the CTP over WirelessHART Data-Link layer, the reliability of the network protocol can be improved compare to actual CTP standard implementation. This thesis describes the design and implementation of Collection Tree Protocol over WirelessHART Data-Link layer. The implementation is done using TinyOS, nesC programming language using Crossbow TelosB CC2420 radio chip nodes. The results and experiments show the evaluation of the system prototype.

APA, Harvard, Vancouver, ISO, and other styles

18

De, La Fuente Jesus Miguel. "Visualization in Genealogical Data : Genealogical tree application for Facebook." Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-13991.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Negassa, Abdissa. "Validation of tree-structured prediction for censored survival data." Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=40407.

Full text

Abstract:

Objectives. (i) to develop a computationally efficient algorithm of tree-growing for censored survival data, (ii) to assess the performance of two validation schemes, and (iii) to evaluate the performance of computationally inexpensive model selection criteria in relation to cross-validation.
Background. In the tree-growing literature, a number of computationally inexpensive model selection criteria were suggested; however, none of them were systematically investigated for their performance. RECursive Partition and AMalgamation (RECPAM) is one of the existing tree-growing algorithms that provide such built-in model selection criteria. Application of RECPAM's different model selection criteria leads to a wide range of models (40). Since RECPAM is an exploratory data analysis tool, it is desirable to reduce its computational cost and establish the general properties of its model selection criteria so that clear guidelines can be suggested.
Methods. A computationally efficient tree-growing algorithm for prognostic classification and subgroup analysis is developed by employing the Cox score statistic and the Mantel-Haenszel estimator of the relative hazard. Two validation schemes, restricting validation to pruning and parameter estimation and validating the whole process of tree growing, are implemented and evaluated in simulation. Three model selection criteria--the elbow approach, minimum Akaike Information Criterion (AIC), and the one standard error (ISE) rule--were compared to cross-validation under a broad range of scenarios using simulation. Examples of medical data analyses are presented.
Conclusions. A gain in computational efficiency is achieved while obtaining the same result as the original RECPAM approach. The restricted validation scheme is computationally less expensive, however, it is biased. In the case of subgroup analysis, to adjust properly for influential prognostic factors, we suggest constructing a prognostic classification on such factors and using the resulting classification as strata in conducting the subgroup analysis. None of the model selection criteria studied exhibit a consistently superior performance over the range of scenarios considered here. Therefore, we propose a two-stage model selection strategy in which cross-validation is employed at the first step, and if according to this step there is evidence of structure in the data set, then the elbow rule is recommended in the second step.

APA, Harvard, Vancouver, ISO, and other styles

20

Flöter, André. "Analyzing biological expression data based on decision tree induction." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=978444728.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Mondy, William Lafayette. "Data acquisition for modeling and visualization of vascular tree." [Tampa, Fla] : University of South Florida, 2009. http://purl.fcla.edu/usf/dc/et/SFE0003082.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Mangalvedkar, Pallavi Ramachandra. "GPU-ASSISTED RENDERING OF LARGE TREE-SHAPED DATA SETS." Wright State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1195491112.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Källström, Johan. "Building and Tree Parameterization in Partiallyoccluded 2.5D DSM Data." Thesis, Linköpings universitet, Institutionen för systemteknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-125130.

Full text

Abstract:

Automatic 3D building reconstruction has been a hot research area; a task which has been done manually even up today. Automating the task of building reconstruction enables more applications where up to date information is of great importance. This thesis proposes a system to extract parametric buildings and trees from dense aerial stereo image data. The method developed for the tree identification and parameterization is a totally new approach which have yielded great results. The focus has been to extract the data in such a way that small flying platforms can use it for navigational purposes. The degree of simplification is therefor high. The building parameterization part starts with identifying roof faces by Region Growing random seeds in the digital surface model (DSM) until a coverage threshold is met.For each roof face a plane is fitted using a Least Square approach.The actual parameterization is started with calculating the intersection between the roof faces. Given the nature of 2.5D DSM data there is no possibility to perform wall fitting. Therefor all the walls will be constructed with a 2D line Hough transform of the border data of all the roof faces. The tree parameterization is done by searching for possible roof face topologies resembling the signature of a tree. For each possible tree topology a second degree polynomial surface is fitted to the DSM data covered by the faces in the topology. By looking at the parameters of the fitted polynomial it is then possible to determine if it is a tree or not. All the extraction steps were implemented and evaluated in Matlab, all algorithms have been described, discussed and motivated in the thesis.

APA, Harvard, Vancouver, ISO, and other styles

24

Badulescu, Laviniu Aurelian. "ATTRIBUTE SELECTION MEASURE IN DECISION TREE GROWING." Universitaria Publishing House, 2007. http://hdl.handle.net/10150/105610.

Full text

Abstract:

One of the major tasks in Data Mining is classification. The growing of Decision Tree from data is a very efficient technique for learning classifiers. The selection of an attribute used to split the data set at each Decision Tree node is fundamental to properly classify objects; a good selection will improve the accuracy of the classification. In this paper, we study the behavior of the Decision Trees induced with 14 attribute selection measures over three data sets taken from UCI Machine Learning Repository.

APA, Harvard, Vancouver, ISO, and other styles

25

Mori, Tomoya. "Methods for Analyzing Tree-Structured Data and their Applications to Computational Biology." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/202741.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Yu, Ping. "FP-tree Based Spatial Co-location Pattern Mining." Thesis, University of North Texas, 2005. https://digital.library.unt.edu/ark:/67531/metadc4724/.

Full text

Abstract:

A co-location pattern is a set of spatial features frequently located together in space. A frequent pattern is a set of items that frequently appears in a transaction database. Since its introduction, the paradigm of frequent pattern mining has undergone a shift from candidate generation-and-test based approaches to projection based approaches. Co-location patterns resemble frequent patterns in many aspects. However, the lack of transaction concept, which is crucial in frequent pattern mining, makes the similar shift of paradigm in co-location pattern mining very difficult. This thesis investigates a projection based co-location pattern mining paradigm. In particular, a FP-tree based co-location mining framework and an algorithm called FP-CM, for FP-tree based co-location miner, are proposed. It is proved that FP-CM is complete, correct, and only requires a small constant number of database scans. The experimental results show that FP-CM outperforms candidate generation-and-test based co-location miner by an order of magnitude.

APA, Harvard, Vancouver, ISO, and other styles

27

Moss, Graeme E. "Benchmarking purely functional data structures." Thesis, University of York, 2000. http://etheses.whiterose.ac.uk/10869/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Ben, Hafaiedh Khaled. "Studying the Properties of a Distributed Decentralized b+ Tree with Weak-Consistency." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/20578.

Full text

Abstract:

Distributed computing is very popular in the field of computer science and is widely used in web applications. In such systems, tasks and resources are partitioned among several computers so that the workload can be shared among the different computers in the network, in contrast to systems using a single server computer. Distributed system designs are used for many practical reasons and are often found to be more scalable, robust and suitable for many applications. The aim of this thesis is to study the properties of a distributed tree data-structure that allow searches, insertions and deletions of data elements. In particular, the b- tree structure [13] is considered, which is a generalization of a binary search tree. The study consists of analyzing the effect of distributing such a tree among several computers and investigates the behavior of such structure over a long period of time by growing the network of computers supporting the tree, while the state of the structure is instantly updated as insertions and deletions operations are performed. It also attempts to validate the necessary and sufficient invariants of the b-tree-structure that guarantee the correctness of the search operations. A simulation study is also conducted to verify the validity of such distributed data-structure and the performance of the algorithm that implements it. Finally, a discussion is provided in the end of the thesis to compare the performance of the system design with other distributed tree structure designs.

APA, Harvard, Vancouver, ISO, and other styles

29

Wunder, Jan. "Conceptual advancement and ecological applications of tree mortality models based on tree-ring and forest inventory data /." Zürich : ETH, 2007. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=17197.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Brewer, Peter W., Daniel Murphy, and Esther Jansma. "Tricycle: A Universal Conversion Tool For Digital Tree-Ring Data." Tree-Ring Society, 2011. http://hdl.handle.net/10150/622638.

Full text

Abstract:

There are at least 21 dendro-data formats used in dendrochronology laboratories around the world. Many of these formats are read by a limited number of programs, thereby inhibiting collaboration, limiting critical review of analyses, and risking the long-term accessibility of datasets. Some of the older formats are supported by a single program and are falling into disuse, opening the risk for data to become obsolete and unreadable. These formats also have a variety of flaws, including but not limited to no accurate method for denoting measuring units, little or no metadata support, lack of support for variables other than whole ring widths (e.g. earlywood/latewood widths, ratios and density). The proposed long-term solution is the adoption of a universal data standard such as the Tree-Ring Data Standard (TRiDaS). In the short and medium term, however, a tool is required that is capable of converting not only back and forth to this standard, but between any of the existing formats in use today. Such a tool is also required to provide continued access to data archived in obscure formats. This paper describes TRiCYCLE, a new application that does just this. TRiCYCLE is an open-source, cross-platform, desktop application for the conversion of the most commonly used data formats. Two open source Java libraries upon which TRiCYCLE depends are also described. These libraries can be used by developers to implement support for all data formats within their own applications.

APA, Harvard, Vancouver, ISO, and other styles

31

Lundkvist, Emil. "Decision Tree Classification and Forecasting of Pricing Time Series Data." Thesis, KTH, Reglerteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-151017.

Full text

Abstract:

Many companies today, in different fields of operations and sizes, have access to a vast amount of data which was not available only a couple of years ago. This situation gives rise to questions regarding how to organize and use the data in the best way possible. In this thesis a large database of pricing data for products within various market segments is analysed. The pricing data is from both external and internal sources and is therefore confidential. Because of the confidentiality, the labels from the database are in this thesis substituted with generic ones and the company is not referred to by name, but the analysis is carried out on the real data set. The data is from the beginning unstructured and difficult to overlook. Therefore, it is first classified. This is performed by feeding some manual training data into an algorithm which builds a decision tree. The decision tree is used to divide the rest of the products in the database into classes. Then, for each class, a multivariate time series model is built and each product’s future price within the class can be predicted. In order to interact with the classification and price prediction, a front end is also developed. The results show that the classification algorithm both is fast enough to operate in real time and performs well. The time series analysis shows that it is possible to use the information within each class to do predictions, and a simple vector autoregressive model used to perform it shows good predictive results.

APA, Harvard, Vancouver, ISO, and other styles

32

Rolin), Cheng David R. (David. "Parallel sorting and Star-P data movement and tree flattening." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/33117.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.
Includes bibliographical references (p. 81-84).
This thesis studies three problems in the field of parallel computing. The first result provides a deterministic parallel sorting algorithm that empirically shows an improvement over two sample sort algorithms. When using a comparison sort, this algorithm is 1-optimal in both computation and communication. The second study develops some extensions to the Star-P system [7, 6] that allows it to solve more real problems. The timings provided indicate the scalability of the implementations on some systems. The third problem concerns automatic parallelization. By representing a computation as a binary tree, which we assume is given, it can be shown that the height corresponds to the parallel execution time, given enough processors. The main result of the chapter is an algorithm that uses tree rotations to reduce the height of an arbitrary binary tree to become logarithmic in the number of its inputs. This method can solve more general problems as the definition of tree rotation is slightly altered; examples are given that derive the parallel prefix algorithm, and give a speedup in the dynamic programming approach to the computation of Fibonacci numbers.
by David R. Cheng.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

33

Hassan, Diman. "A tree-based measure for hierarchical data in mixed databases." Thesis, University of Nottingham, 2016. http://eprints.nottingham.ac.uk/34652/.

Full text

Abstract:

The structure of the data in a mixed database can be a barrier when clustering that database into meaningful groups. A hierarchically structured database necessitates efficient distance measures and clustering algorithms to locate similarities between data objects. Therefore, existing literature proposes hierarchical distance measures to measure the similarities between the records in hierarchical databases. The main contribution of this research is to create and test a new distance measure for large hierarchical databases consisting of mixed data types and attributes, based on an existing tree-based (hierarchical) distance metric, the pq-gram distance metric. Several aims and objectives were pursued to fill a number of gaps in the current body of knowledge. One of these goals was to verify the validity of the pq-gram distance metric when applied to different data sets, and to compare and combine it with a number of different distance measures to demonstrate its usefulness across large mixed databases. To achieve this, further work focused on exploring how to exploit the existing method as a measure of hierarchical data attributes in mixed data sets, and to ascertain whether the new method would produce better results with large mixed databases. For evaluation purposes, the pq-gram metric was applied to The Health Improvement Network (THIN) database to determine if it could identify similarities between the records in the database. After this, it was applied to mixed data to examine different distance measures, which include non-hierarchical and other hierarchical measures, and to combine them to create a Combined Distance Function (CDF). The CDF improved the results when applied to different data sets, such as the hierarchical National Bureau of Economic Research of United States (NBER US) Patent data set and the mixed (THIN) data set. The CDF was then modified to create a New-CDF, which used only the hierarchical pq-gram metric to measure the hierarchical attributes in the mixed data set. The New-CDF worked well, finding the most similar data records when applied to the THIN data set, and grouping them in one cluster using the Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) clustering algorithm. The quality of the clusters was explored using two internal validation indices, Silhouette and C-Index, where the values showed good compactness and quality of the clusters obtained using the new method.

APA, Harvard, Vancouver, ISO, and other styles

34

Chippa, Mukesh Kumar. "Performance of Tree-Based Data Collection in Wireless Sensor Systems." University of Akron / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=akron1312209206.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Fan, Hang. "Species Tree Likelihood Computation Given SNP Data Using Ancestral Configurations." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1385995244.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Beltur, Bharat Ramachandra. "Adaptive Slicing in Additive Manufacturing using Strip Tree Data Structures." University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1479815018228663.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

GIESKE, EDMUND J. "B+ TREE CACHE MEMORY PERFORMANCE." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092344402.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

宋永健 and Wing-kin Sung. "Fast labeled tree comparison via better matching algorithms." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1998. http://hub.hku.hk/bib/B31239316.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Sung, Wing-kin. "Fast labeled tree comparison via better matching algorithms /." Hong Kong : University of Hong Kong, 1998. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20229999.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Agarwal, Khushbu. "A partition based approach to approximate tree mining a memory hierarchy perspective /." Columbus, Ohio : Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1196284256.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Serra-Diaz, Josep M., Brian J. Enquist, Brian Maitner, Cory Merow, and Jens-C. Svenning. "Big data of tree species distributions: how big and how good?" SPRINGER HEIDELBERG, 2018. http://hdl.handle.net/10150/626611.

Full text

Abstract:

Background: Trees play crucial roles in the biosphere and societies worldwide, with a total of 60,065 tree species currently identified. Increasingly, a large amount of data on tree species occurrences is being generated worldwide: from inventories to pressed plants. While many of these data are currently available in big databases, several challenges hamper their use, notably geolocation problems and taxonomic uncertainty. Further, we lack a complete picture of the data coverage and quality assessment for open/public databases of tree occurrences. Methods: We combined data from five major aggregators of occurrence data (e.g. Global Biodiversity Information Facility, Botanical Information and Ecological Network v.3, DRYFLOR, RAINBIO and Atlas of Living Australia) by creating a workflow to integrate, assess and control data quality of tree species occurrences for species distribution modeling. We further assessed the coverage - the extent of geographical data - of five economically important tree families (Arecaceae, Dipterocarpaceae, Fagaceae, Myrtaceae, Pinaceae). Results: Globally, we identified 49,206 tree species (84.69% of total tree species pool) with occurrence records. The total number of occurrence records was 36.69 M, among which 6.40 M could be considered high quality records for species distribution modeling. The results show that Europe, North America and Australia have a considerable spatial coverage of tree occurrence data. Conversely, key biodiverse regions such as South-East Asia and central Africa and parts of the Amazon are still characterized by geographical open-public data gaps. Such gaps are also found even for economically important families of trees, although their overall ranges are covered. Only 15,140 species (26.05%) had at least 20 records of high quality. Conclusions: Our geographical coverage analysis shows that a wealth of easily accessible data exist on tree species occurrences worldwide, but regional gaps and coordinate errors are abundant. Thus, assessment of tree distributions will need accurate occurrence quality control protocols and key collaborations and data aggregation, especially from national forest inventory programs, to improve the current publicly available data.

APA, Harvard, Vancouver, ISO, and other styles

42

Norelius, Jenny, and Antonello Tacchi. "Evaluating data structures for range queries in brain simulations." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229767.

Full text

Abstract:

Our brain and nervous system is a vital organ to us, since it is from there our thoughts, personalities, and other mental capacities originate. Within this field of neuroscience a common method of study is to build and run large scale brain simulations where up to hundred thousand neurons are used to produce a model of a brain in three dimensional space. To find all neurites within a specific area is to perform a range query. A vast number of range queries are required when running brain simulations which makes it important that the data structure used to store the simulated neurons is efficient. This study evaluate three common data structures, also called spatial index; the R-tree, Quadtree and R*-tree (Rstar-tree). We test their performance for range queries with regards to execution time, incurred reads, build time, size of data and density of data. The data used is models of a typical neuron so that the characteristics of the data set is preserved. The results show that the R*-tree outperforms the other indices by being significantly more efficient compared to the others, with the R-tree having slightly worse performance than the Quadtree. The time it takes to build the index is to be almost identical for all implementations.
Vår hjärna och nervsystem är ett grundläggande organ för oss. Det är där ifrån våra tankar, personligheter och mentala kapaciteter kommer ifrån. Inom neurovetenskap är en vanlig forskningsmetod att köra storskaliga hjärnsimuleringar där hundratusentals neuroner används för att skapa en modell av hjärnan i 3D. För att hitta alla neuroner inom en viss area används en så kallad intervallfråga. En stor mängd intervallfrågor behövs för hjärnsimuleringar vilket gör det viktigt att datastrukturerna som används för detta är kostnadseffektiva. Denna studie har som mål att jämföra tre stycken vanliga datastrukturer som används för intervallfrågor. Dessa är R-tree, Quadtree och R*-tree. Deras prestanda testas för exekveringstid, antal läsningar, konstruktionstid, samt storlek och densitet på neuroner. För att skapa hjärnsimuleringen används en typisk neuron som standard sådant att dess karakteristiska egenskaper bevaras. Resultaten från studien visar att R*-tree hade den tydligt bästa prestandan för de givna kriterierna, och att Quadtree har en något bättre prestanda än R-tree. Tiden det tar att mata in neuronerna i datastrukturerna är i stort sett densamma.

APA, Harvard, Vancouver, ISO, and other styles

43

Curtin, Ryan Ross. "Improving dual-tree algorithms." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54354.

Full text

Abstract:

This large body of work is entirely centered around dual-tree algorithms, a class of algorithm based on spatial indexing structures that often provide large amounts of acceleration for various problems. This work focuses on understanding dual-tree algorithms using a new, tree-independent abstraction, and using this abstraction to develop new algorithms. Stated more clearly, the thesis of this entire work is that we may improve and expand the class of dual-tree algorithms by focusing on and providing improvements for each of the three independent components of a dual-tree algorithm: the type of space tree, the type of pruning dual-tree traversal, and the problem-specific BaseCase() and Score() functions. This is demonstrated by expressing many existing dual-tree algorithms in the tree-independent framework, and focusing on improving each of these three pieces. The result is a formidable set of generic components that can be used to assemble dual-tree algorithms, including faster traversals, improved tree theory, and new algorithms to solve the problems of max-kernel search and k-means clustering.

APA, Harvard, Vancouver, ISO, and other styles

44

Kim, Seoung Bum. "Data Mining in Tree-Based Models and Large-Scale Contingency Tables." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/6825.

Full text

Abstract:

This thesis is composed of two parts. The first part pertains to tree-based models. The second part deals with multiple testing in large-scale contingency tables. Tree-based models have gained enormous popularity in statistical modeling and data mining. We propose a novel tree-pruning algorithm called frontier-based tree-pruning algorithm (FBP). The new method has an order of computational complexity comparable to cost-complexity pruning (CCP). Regarding tree pruning, it provides a full spectrum of information. Numerical study on real data sets reveals a surprise: in the complexity-penalization approach, most of the tree sizes are inadmissible. FBP facilitates a more faithful implementation of cross validation, which is favored by simulations. One of the most common test procedures using two-way contingency tables is the test of independence between two categorizations. Current test procedures such as chi-square or likelihood ratio tests provide overall independency but bring limited information about the nature of the association in contingency tables. We propose an approach of testing independence of categories in individual cells of contingency tables based on a multiple testing framework. We then employ the proposed method to identify the patterns of pair-wise associations between amino acids involved in beta-sheet bridges of proteins. We identify a number of amino acid pairs that exhibit either strong or weak association. These patterns provide useful information for algorithms that predict secondary and tertiary structures of proteins.

APA, Harvard, Vancouver, ISO, and other styles

45

Marrón, Vida Diego. "Improving decision tree and neural network learning for evolving data-streams." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/668371.

Full text

Abstract:

High-throughput real-time Big Data stream processing requires fast incremental algorithms that keep models consistent with most recent data. In this scenario, Hoeffding Trees are considered the state-of-the-art single classifier for processing data streams and they are widely used in ensemble combinations. This thesis is devoted to the improvement of the performance of algorithms for machine learning/artificial intelligence on evolving data streams. In particular, we focus on improving the Hoeffding Tree classifier and its ensemble combinations, in order to reduce its resource consumption and its response time latency, achieving better throughput when processing evolving data streams. First, this thesis presents a study on using Neural Networks (NN) as an alternative method for processing data streams. The use of random features for improving NNs training speed is proposed and important issues are highlighted about the use of NN on a data stream setup. These issues motivated this thesis to go in the direction of improving the current state-of-the-art methods: Hoeffding Trees and their ensemble combinations. Second, this thesis proposes the Echo State Hoeffding Tree (ESHT), as an extension of the Hoeffding Tree to model time-dependencies typically present in data streams. The capabilities of the new proposed architecture on both regression and classification problems are evaluated. Third, a new methodology to improve the Adaptive Random Forest (ARF) is developed. ARF has been introduced recently, and it is considered the state-of-the-art classifier in the MOA framework (a popular framework for processing evolving data streams). This thesis proposes the Elastic Swap Random Forest, an extension to ARF that reduces the number of base learners in the ensemble down to one third on average, while providing similar accuracy than the standard ARF with 100 trees. And finally, a last contribution on a multi-threaded high performance scalable ensemble design that is highly adaptable to a variety of hardware platforms, ranging from server-class to edge computing. The proposed design achieves throughput improvements of 85x (Intel i7), 143x (Intel Xeon parsing from memory), 10x (Jetson TX1, ARM) and 23x (X-Gene2, ARM) compared to single-threaded MOA on i7. In addition, the proposal achieves 75% parallel efficiency when using 24 cores on the Intel Xeon.
Procesar grandes flujos de datos (Big Data Streams, BDS) en tiempo real requiere el uso de algoritmos incrementales rápidos que mantengan los modelos consistentes con los datos más recientes. En este escenario, los Hoeffding Trees (HT) se consideran el clasificador simple más avanzado para procesar BDS, razon por la cual son ampliamente usados como base a la hora de combinar clasificadores en Ensembles. Esta tesis está dedicada a la mejora del rendimiento de algoritmos para Machine Learning/Iteligencia Artificial en BDS que evolucionan con el tiempo (es decir, BDS cuya distribución estadística cambia con el tiempo). En particular, nuestro objetivo es mejorar el Hoeffding Tree y sus combinaciones en Ensembles, con el objetivo de reducir el consumo de recursos y la latencia en el tiempo de respuesta, logrando un mejor rendimiento al procesar BDS que evolucionan en el tiempo. Primero, se presenta un estudio sobre el uso de redes neuronales (NN) con parámetros aleatorios como un método alternativo para procesar BDS con el objetivo de mejorar la velocidad de entrenamiento de Nns. También se destacan problemas importantes derivados del uso de NN para BDS. Como consecuencia, esta tesis tomo la dirección de mejorar los métodos de vanguardia en BDS: Hoeffding Trees y sus combinaciones en Ensembles. Segundo, se propone el Echo State Hoeffding Tree (ESHT), como una extensión del HT para modelar las dependencias temporales típicamente presentes en BDS. La nueva arquitectura propuesta se evalúa tanto en problemas de regresión como de clasificación. Tercero, se propone una extensión para el Adaptive Random Forest (ARF), publicado recientemente y considerado como el clasificador mas potente implementado en MOA (un framework muy popular para procesar BDS). Proponemos el Elastic Swap Random Forest para reducir el número de clasificadores en el ensemble a un tercio en promedio, al tiempo se mantiene un accuracy similar a la de un ARF estándar con 100 árboles. Finalmente, la última contribución de esta tesis es una arquitectura de Ensembles multi hilo para procesar BDS. Nuestro diseño es altamente adaptable a una variedad de plataformas de hardware, que van desde servidores hasta pequeños dispositivos en el Edge Computing (pej, Internet de las Cosas). El diseño propuesto logra mejoras de rendimiento de 85x (Intel i7), 143x (análisis de Intel Xeon desde la memoria), 10x (Jetson TX1, ARM) y 23x (X-Gene2, ARM) en comparación con MOA (un solo proceso) en un Intel i7. Además, la propuesta logra una eficiencia paralela del 75 \% cuando se usan 24 núcleos en el Intel Xeon.

APA, Harvard, Vancouver, ISO, and other styles

46

Al-Jabbouli, Hasan. "Data clustering using the Bees Algorithm and the Kd-tree structure." Thesis, Cardiff University, 2009. http://orca.cf.ac.uk/54947/.

Full text

Abstract:

Data clustering has been studied intensively during the past decade. The K-means and C-means algorithms are the most popular of clustering techniques. The former algorithm is suitable for 'crisp' clustering and the latter, for 'fuzzy' clustering. Clustering using the K-means or C-means algorithms generally is fast and produces good results. Although these algorithms have been successfully implemented in several areas, they still have a number of limitations. The main aim of this work is to develop flexible data management strategies to address some of those limitations and improve the performance of the algorithms. The first part of the thesis introduces improvements to the K-means algorithm. A flexible data structure was applied to help the algorithm to find stable results and to decrease the number of nearest neighbour queries needed to assign data points to clusters. The method has overcome most of the deficiencies of the K-means algorithm. The second and third parts of the thesis present two new clustering algorithms that are capable of locating near optimal solutions efficiently. The proposed algorithms combine the simplicity of the K-means algorithm and the C-means algorithm with the capability of a new optimisation method called the Bees Algorithm to avoid local optima in crisp and fuzzy clustering, respectively. Experimental results for different data sets have demonstrated that the new clustering algorithms produce better performances than those of other algorithms based upon combining an evolutionary optimisation tool and the K-means and C-means clustering methods. The fourth part of this thesis presents an improvement to the basic Bees Algorithm by applying the concept of recursion to reduce the randomness of its local search procedure. The improved Bees Algorithm was applied to crisp and fuzzy data clustering of several data sets. The results obtained confirm the superior performance of the new algorithm.

APA, Harvard, Vancouver, ISO, and other styles

47

Cheng, James Sheung-Chak. "The development of a structural index tree for processing XML data /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?COMP%202004%20CHENG.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 80-86). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

48

Towner, Ronald H., and Pearce Paul Creasman. "Tree-ring sample data." 2010. http://hdl.handle.net/10150/113563.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Liu, Yen-Ju, and 劉晏如. "Reconfiguration of Maximum-lifetime Data Gathering Trees with Tree Structure Data Broadcasting in Sensor Network." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/vsrpj8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Towner, Ronald H., and Pearce Paul Creasman. "Tree-ring data summary by feature." 2010. http://hdl.handle.net/10150/113593.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Tree data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles