Dissertations / Theses: 'Sparse data'

1

Gullipalli, Deep Kumar. "Data envelopment analysis with sparse data." Thesis, Kansas State University, 2011. http://hdl.handle.net/2097/13092.

Full text

Abstract:

Master of Science
Department of Industrial & Manufacturing Systems Engineering
David H. Ben-Arieh
Quest for continuous improvement among the organizations and issue of missing data for data analysis are never ending. This thesis brings these two topics under one roof, i.e., to evaluate the productivity of organizations with sparse data. This study focuses on Data Envelopment Analysis (DEA) to determine the efficiency of 41 member clinics of Kansas Association of Medically Underserved (KAMU) with missing data. The primary focus of this thesis is to develop new reliable methods to determine the missing values and to execute DEA. DEA is a linear programming methodology to evaluate relative technical efficiency of homogenous Decision Making Units, using multiple inputs and outputs. Effectiveness of DEA depends on the quality and quantity of data being used. DEA outcomes are susceptible to missing data, thus, creating a need to supplement sparse data in a reliable manner. Determining missing values more precisely improves the robustness of DEA methodology. Three methods to determine the missing values are proposed in this thesis based on three different platforms. First method named as Average Ratio Method (ARM) uses average value, of all the ratios between two variables. Second method is based on a modified Fuzzy C-Means Clustering algorithm, which can handle missing data. The issues associated with this clustering algorithm are resolved to improve its effectiveness. Third method is based on interval approach. Missing values are replaced by interval ranges estimated by experts. Crisp efficiency scores are identified in similar lines to how DEA determines efficiency scores using the best set of weights. There exists no unique way to evaluate the effectiveness of these methods. Effectiveness of these methods is tested by choosing a complete dataset and assuming varying levels of data as missing. Best set of recovered missing values, based on the above methods, serves as a source to execute DEA. Results show that the DEA efficiency scores generated with recovered values are close within close proximity to the actual efficiency scores that would be generated with the complete data. As a summary, this thesis provides an effective and practical approach for replacing missing values needed for DEA.

APA, Harvard, Vancouver, ISO, and other styles

2

Maiga, Aïssata, and Johanna Löv. "Real versus Simulated data for Image Reconstruction : A comparison between training with sparse simulated data and sparse real data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302028.

Full text

Abstract:

Our study investigates how training with sparse simulated data versus sparse real data affects image reconstruction. We compared on several criteria such as number of events, speed and high dynamic range, HDR. The results indicate that the difference between simulated data and real data is not large. Training with real data performed often better, but only by 2%. The findings confirm what earlier studies have shown; training with simulated data generalises well, even when training on sparse datasets as this study shows.
Vår studie undersöker hur träning med gles simulerad data och gles verklig data från en eventkamera, påverkar bildrekonstruktion. Vi tränade två modeller, en med simulerad data och en med verklig för att sedan jämföra dessa på ett flertal kriterier som antal event, hastighet och high dynamic range, HDR. Resultaten visar att skillnaden mellan att träna med simulerad data och verklig data inte är stor. Modellen tränad med verklig data presterade bättre i de flesta fall, men den genomsnittliga skillnaden mellan resultaten är bara 2%. Resultaten bekräftar vad tidigare studier har visat; träning med simulerad data generaliserar bra, och som denna studie visar även vid träning på glesa datamängder.

APA, Harvard, Vancouver, ISO, and other styles

3

Lari, Kamran A. "Sparse data estimation for knowledge processes." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=86073.

Full text

Abstract:

During recent years, industry has increasingly focused on knowledge processes. Similar to traditional or manufacturing processes, knowledge processes need to be managed and controlled in order to provide the expected results for which they were designed. During the last decade, the principals of process management have evolved, especially through work done in software engineering and workflow management.
Process monitoring is one of the major components for any process management system. There have been efforts to design process control and monitoring systems; however, no integrated system has yet been developed as a "generic intelligent system shell". In this dissertation, an architecture for an integrated process monitoring system (IPMS) is developed, whereby the end-to-end activities of a process can be automatically measured and evaluated. In order to achieve this goal, various components of the IPMS and the interrelationship among these components are designed.
Furthermore, a comprehensive study on the available methodologies and techniques revealed that sparse data estimation (SDE) is the key component of the IPMS which does not yet exist. Consequently, a series of algorithms and methodologies are developed as the basis for the sparse data estimation of knowledge based processes. Finally, a series of computer programs demonstrate the feasibility and functionality of the proposed approach when applied to a sample process. The sparse data estimation method is successful for not only knowledge based processes, but also for any process, and indeed for any set of activities that can be modeled as a network.

APA, Harvard, Vancouver, ISO, and other styles

4

Beresford, D. J. "3D face modelling from sparse data." Thesis, University of Surrey, 2004. http://epubs.surrey.ac.uk/736/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Rommedahl, David, and Martin Lindström. "Learning Sparse Graphs for Data Prediction." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-295623.

Full text

Abstract:

Graph structures can often be used to describecomplex data sets. In many applications, the graph structureis not known but must be inferred from data. Furthermore, realworld data is often naturally described by sparse graphs. Inthis project, we have aimed at recreating the results describedin previous work, namely to learn a graph that can be usedfor prediction using an ℓ1-penalised LASSO approach. We alsopropose different methods for learning and evaluating the graph. We have evaluated the methods on synthetic data and real-worldSwedish temperature data. The results show that we are unableto recreate the results of the previous research team, but wemanage to learn sparse graphs that could be used for prediction. Further work is needed to verify our results.
Grafstrukturer kan ofta användas för att beskriva komplex data. I många tillämpningar är grafstrukturen inte känd, utan måste läras från data. Vidare beskrivs verklig data ofta naturligt av glesa grafer. I detta projekt har vi försökt återskapa resultaten från ett tidigare forskningsarbete, nämligen att lära en graf som kan användas för prediktion med en ℓ1pennaliserad LASSO-metod. Vi föreslår även andra metoder för inlärning och utvärdering av grafen. Vi har testat metoderna på syntetisk data och verklig temperaturdata från Sverige. Resultaten visar att vi inte kan återskapa de tidigare forskarnas resultat, men vi lyckas lära in glesa grafer som kan användas för prediktion. Ytterligare arbete krävs för att verifiera våra resultat.
Kandidatexjobb i elektroteknik 2020, KTH, Stockholm

APA, Harvard, Vancouver, ISO, and other styles

6

Prost, Vincent. "Sparse unsupervised learning for metagenomic data." Electronic Thesis or Diss., université Paris-Saclay, 2020. http://www.theses.fr/2020UPASL013.

Full text

Abstract:

Les avancées technologiques dans le séquençage ADN haut débit ont permis à la métagénomique de considérablement se développer lors de la dernière décennie. Le séquencage des espèces directement dans leur milieu naturel a ouvert de nouveaux horizons dans de nombreux domaines de recherche. La réduction des coûts associée à l'augmentation du débit fait que de plus en plus d'études sont lancées actuellement.Dans cette thèse nous considérons deux problèmes ardus en métagénomique, à savoir le clustering de lectures brutes et l'inférence de réseaux microbiens. Pour résoudre ces problèmes, nous proposons de mettre en oeuvre des méthodes d'apprentissage non supervisées utilisant le principe de parcimonie, ce qui prend la forme concrète de problèmes d'optimisation avec une pénalisation de norme l1.Dans la première partie de la thèse, on considère le problème intermédiaire du clustering des séquences ADN dans des partitions biologiquement pertinentes (binning). La plupart des méthodes computationelles n'effectuent le binning qu'après une étape d'assemblage qui est génératrice d'erreurs (avec la création de contigs chimériques) et de pertes d'information. C'est pourquoi nous nous penchons sur le problème du binning sans assemblage préalable. Nous exploitons le signal de co-abondance des espèces au travers des échantillons mesuré via le comptage des k-mers (sous-séquences de taille k) longs. L'utilisation du Local Sensitive Hashing (LSH) permet de contenir, au coût d'une approximation, l'explosion combinatoire des k-mers possibles dans un espace de cardinal fixé. La première contribution de la thèse est de proposer l'application d'une factorisation en matrices non-négatives creuses (sparse NMF) sur la matrice de comptage des k-mers afin de conjointement extraire une information de variation d'abondance et d'effectuer le clustering des k-mers. Nous montrons d'abord le bien fondé de l'approche au niveau théorique. Puis, nous explorons dans l'état de l'art les méthodes de sparse NMF les mieux adaptées à notre problème. Les méthodes d'apprentissage de dictionnaire en ligne ont particulièrement retenu notre attention de par leur capacité à passer à l'échelle pour des jeux de données comportant un très grand nombre de points. La validation des méthodes de binning en métagénomique sur des données réelles étant difficile à cause de l'absence de vérité terrain, nous avons créé et utilisé plusieurs jeux de données synthétiques pour l'évaluation des différentes méthodes. Nous montrons que l'application de la sparse NMF améliore les méthodes de l'état de l'art pour le binning sur ces jeux de données. Des expérience sur des données métagénomiques réelles issus de 1135 échantillons de microbiotes intestinaux d'individus sains ont également été menées afin de montrer la pertinence de l'approche.Dans la seconde partie de la thèse, on considère les données métagénomiques après le profilage taxonomique, c'est à dire des donnés multivariées représentant les niveaux d'abondance des taxons au sein des échantillons. Les microbes vivant en communautés structurées par des interactions écologiques, il est important de pouvoir identifier ces interactions. Nous nous penchons donc sur le problème de l'inférence de réseau d'interactions microbiennes à partir des profils taxonomiques. Ce problème est souvent abordé dans le cadre théorique des modèles graphiques gaussiens (GGM), pour lequel il existe des algorithmes d'inférence puissants tel que le graphical lasso. Mais les méthodes statistiques existantes sont très limitées par l'aspect extrêmement creux des profils taxonomiques que l'on rencontre en métagénomique, notamment par la grande proportion de zéros dits biologiques (i.e. liés à l'absence réelle de taxons). Nous proposons un model log normal avec inflation de zéro visant à traiter ces zéros biologiques et nous montrons un gain de performance par rapport aux méthodes de l'état de l'art pour l'inférence de réseau d'interactions microbiennes
The development of massively parallel sequencing technologies enables to sequence DNA at high-throughput and low cost, fueling the rise of metagenomics which is the study of complex microbial communities sequenced in their natural environment.Metagenomic problems are usually computationally difficult and are further complicated by the massive amount of data involved.In this thesis we consider two different metagenomics problems: 1. raw reads binning and 2. microbial network inference from taxonomic abundance profiles. We address them using unsupervised machine learning methods leveraging the parsimony principle, typically involving l1 penalized log-likelihood maximization.The assembly of genomes from raw metagenomic datasets is a challenging task akin to assembling a mixture of large puzzles composed of billions or trillions of pieces (DNA sequences). In the first part of this thesis, we consider the related task of clustering sequences into biologically meaningful partitions (binning). Most of the existing computational tools perform binning after read assembly as a pre-processing, which is error-prone (yielding artifacts like chimeric contigs) and discards vast amounts of information in the form of unassembled reads (up to 50% for highly diverse metagenomes). This motivated us to try to address the raw read binning (without prior assembly) problem. We exploit the co-abundance of species across samples as discriminative signal. Abundance is usually measured via the number of occurrences of long k-mers (subsequences of size k). The use of Local Sensitive Hashing (LSH) allows us to contain, at the cost of some approximation, the combinatorial explosion of long k-mers indexing. The first contribution of this thesis is to propose a sparse Non-Negative Matrix factorization (NMF) of the samples x k-mers count matrix in order to extract abundance variation signals. We first show that using sparse NMF is well-grounded since data is a sparse linear mixture of non-negative components. Sparse NMF exploiting online dictionary learning algorithms retained our attention, including its decent behavior on largely asymmetric data matrices. The validation of metagenomic binning being difficult on real datasets, because of the absence of ground truth, we created and used several benchmarks for the different methods evaluated on. We illustrated that sparse NMF improves state of the art binning methods on those datasets. Experiments conducted on a real metagenomic cohort of 1135 human gut microbiota showed the relevance of the approach.In the second part of the thesis, we consider metagenomic data after taxonomic profiling: multivariate data representing abundances of taxa across samples. It is known that microbes live in communities structured by ecological interaction between the members of the community. We focus on the problem of the inference of microbial interaction networks from taxonomic profiles. This problem is frequently cast into the paradigm of Gaussian graphical models (GGMs) for which efficient structure inference algorithms are available, like the graphical lasso. Unfortunately, GGMs or variants thereof can not properly account for the extremely sparse patterns occurring in real-world metagenomic taxonomic profiles. In particular, structural zeros corresponding to true absences of biological signals fail to be properly handled by most statistical methods. We present in this part a zero-inflated log-normal graphical model specifically aimed at handling such "biological" zeros, and demonstrate significant performance gains over state-of-the-art statistical methods for the inference of microbial association networks, with most notable gains obtained when analyzing taxonomic profiles displaying sparsity levels on par with real-world metagenomic datasets

APA, Harvard, Vancouver, ISO, and other styles

7

Bissmark, Johan, and Oscar Wärnling. "The Sparse Data Problem Within Classification Algorithms : The Effect of Sparse Data on the Naïve Bayes Algorithm." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209227.

Full text

Abstract:

In today’s society, software and apps based on machine learning and predictive analysis are of the essence. Machine learning has provided us with the possibility of predicting likely future outcomes based on previously collected data in order to save time and resources. A common problem in machine learning is sparse data, which alters the performance of machine learning algorithms and their ability to calculate accurate predictions. Data is considered sparse when certain expected values in a dataset are missing, which is a common phenomenon in general large scaled data analysis. This report will mainly focus on the Naïve Bayes classification algorithm and how it is affected by sparse data in comparison to other widely used classification algorithms. The significance of the performance loss associated with sparse data is studied and analyzed, in order to measure the effect sparsity has on the ability to compute accurate predictions. In conclusion, the results of this report lay a solid argument for the conclusion that the Naïve Bayes algorithm is far less affected by sparse data compared to other common classification algorithms. A conclusion that is in line with what previous research suggests.
I dagens samhälle är maskininlärningsbaserade applikationer och mjukvara, tillsammans med förutsägelser, högst aktuellt. Maskininlärning har gett oss möjligheten att förutsäga troliga utfall baserat på tidigare insamlad data och därigenom spara tid och resurser. Ett vanligt förekommande problem inom maskininlärning är gles data, eftersom det påverkar prestationen hos algoritmer för maskininlärning och deras förmåga att kunna beräkna precisa förutsägelser. Data anses vara gles när vissa förväntade värden i ett dataset saknas, vilket generellt är vanligt förekommande i storskaliga dataset. I den här rapporten ligger fokus huvudsakligen på klassificeringsalgoritmen Naïve Bayes och hur den påverkas av gles data jämfört med andra frekvent använda klassifikationsalgoritmer. Omfattningen av prestationssänkningen som resultat av gles data studeras och analyseras för att mäta hur stor effekt gles data har på förmågan att kunna beräkna precisa förutsägelser. Avslutningsvis lägger resultaten i den här rapporten grund för slutsatsen att algoritmen Naïve Bayes påverkas mindre av gles data jämfört med andra vanligt förekommande klassificeringsalgoritmer. Den här rapportens slutsats stöds även av vad tidigare forskning har visat.

APA, Harvard, Vancouver, ISO, and other styles

8

Embleton, Nina Lois. "Handling sparse spatial data in ecological applications." Thesis, University of Birmingham, 2015. http://etheses.bham.ac.uk//id/eprint/5840/.

Full text

Abstract:

Estimating the size of an insect pest population in an agricultural field is an integral part of insect pest monitoring. An abundance estimate can be used to decide if action is needed to bring the population size under control, and accuracy is important in ensuring that the correct decision is made. Conventionally, statistical techniques are used to formulate an estimate from population density data obtained via sampling. This thesis thoroughly investigates an alternative approach of applying numerical integration techniques. We show that when the pest population is spread over the entire field, numerical integration methods provide more accurate results than the statistical counterpart. Meanwhile, when the spatial distribution is more aggregated, the error behaves as a random variable and the conventional error estimates do not hold. We thus present a new probabilistic approach to assessing integration accuracy for such functions, and formulate a mathematically rigorous estimate of the minimum number of sample units required for accurate abundance evaluation in terms of the species diffusion rate. We show that the integration error dominates the error introduced by noise in the density data and thus demonstrate the importance of formulating numerical integration techniques which provide accurate results for sparse spatial data.

APA, Harvard, Vancouver, ISO, and other styles

9

Sjödin, Rickard. "Interpolation and visualization of sparse GPR data." Thesis, Umeå universitet, Institutionen för fysik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-170946.

Full text

Abstract:

Ground Penetrating Radar is a tool for mapping the subsurface in a noninvasive way. The radar instrument transmits electromagnetic waves and records the resulting scattered field. Unfortunately, the data from a survey can be hard to interpret, and this holds extra true for non-experts in the field. The data are also usually in 2.5D, or pseudo 3D, meaning that the vast majority of the scanned volume is missing data. Interpolation algorithms can, however, approximate the missing data, and the result can be visualized in an application and in this way ease the interpretation. This report has focused on comparing different interpolation algorithms, with extra focus on behaviour when the data get sparse. The compared methods were: Linear, inverse distance weighting, ordinary kriging, thin plate splines and fk domain zone-pass POCS. They were all found to have some strengths and weaknesses in different aspects, although ordinary kriging was found to be the most accurate and created the least artefacts. Inverse distance weighting performed surprisingly well considering its simplicity and low computational cost. A web-based, easy-to-use visualization application was developed in order to view the results from the interpolations. Some of the tools implemented include time slice, crop of a 3D cube, and iso surface.

APA, Harvard, Vancouver, ISO, and other styles

10

CHERUVU, VINAY KUMAR. "CONTINUOUS ANTEDEPENDENCE MODELS FOR SPARSE LONGITUDINAL DATA." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1315579803.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Morris, Henry. "Sparse nonlinear methods for predicting structured data." Thesis, Imperial College London, 2012. http://hdl.handle.net/10044/1/9548.

Full text

Abstract:

Gaussian processes are now widely used to perform key machine learning tasks such as nonlinear regression and classification. An attractive feature of Gaussian process models is the behaviour of the error bars, which grow in regions away from observations where there is high uncertainty about the interpolating function. The complexity of these models scales as O(N3) with sample size, which causes difficulties with large data sets. The goals of this work are to develop nonlinear, nonparametric modelling techniques for structure learning and prediction problems in which there are structured dependencies among the observed data, and to equip our models with sparse representations which serve both to handle prior sparse connectivity assumptions and to reduce computational complexity. We present Kernel Dynamical Structure Learning, a Bayesian method for learning the structure of interactions between variables in multivariate time-series. We design a mutual information kernel to handle time-series trajectories, and show that prior knowledge about network sparsity can be incorporated using heavy-tailed priors over parameters. We evaluate the feasibility of our method on synthetic data, and extend the inference methodology to the handling of uncertain input data. Next, we tackle the problem of belief propagation in Bayesian networks with nonlinear node relations. We propose an exact moment-matching approach for nonlinear belief propagation in any tree-structured graph. We call this Gaussian Process Belief Propagation. We extend this approach by the addition of hidden variables which allow nodes sharing common influences to be conditionally independent. This constitutes a novel approach to multi-output regression on bivariate graph structures, and we call this Dependent Gaussian Process Belief Propagation. We describe sparse inference methods for both models, which reduce computational by learning compact parameterisations of the available training data. We then apply our method to the real-world systems biology problem of protein inference in transcriptional networks.

APA, Harvard, Vancouver, ISO, and other styles

12

Subramaniam, Suresh. "All-optical networks with sparse wavelength conversion /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6032.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Li, Mingfei, and 李明飞. "Sparse representation and fast processing of massive data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B49617977.

Full text

Abstract:

Many computational problems involve massive data. A reasonable solution to those problems should be able to store and process the data in a effective manner. In this thesis, we study sparse representation of data streams and metric spaces, which allows for fast and private computation of heavy hitters from distributed streams, and approximate distance queries between points in a metric space. Specifically, we consider application scenarios where an untrusted aggregator wishes to continually monitor the heavy-hitters across a set of distributed streams. Since each stream can contain sensitive data, such as the purchase history of customers, we wish to guarantee the privacy of each stream, while allowing the untrusted aggregator to accurately detect the heavy hitters and their approximate frequencies. Our protocols are scalable in settings where the volume of streaming data is large, since we guarantee low memory usage and processing overhead by each data source, and low communication overhead between the data sources and the aggregator. We also study fault-tolerant spanners in doubling metrics. A subgraph H for a metric space X is called a k-vertex-fault-tolerant t-spanner ((k; t)-VFTS or simply k-VFTS), if for any subset S _ X with |Sj|≤k, it holds that dHnS(x; y) ≤ t ∙d(x; y), for any pair of x, y ∈ X \ S. For any doubling metric, we give a basic construction of k-VFTS with stretch arbitrarily close to 1 that has optimal O(kn) edges. We also consider bounded hop-diameter, which is studied in the context of fault-tolerance for the first time even for Euclidean spanners. We provide a construction of k-VFTS with bounded hop-diameter: for m ≥2n, we can reduce the hop-diameter of the above k-VFTS to O(α(m; n)) by adding O(km) edges, where α is a functional inverse of the Ackermann's function. In addition, we construct a fault-tolerant single-sink spanner with bounded maximum degree, and use it to reduce the maximum degree of our basic k-VFTS. As a result, we get a k-VFTS with O(k^2n) edges and maximum degree O(k^2).
published_or_final_version
Computer Science
Master
Master of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

14

Dlamini, Delly. "Improving water asset management when data are sparse." Thesis, Cranfield University, 2013. http://dspace.lib.cranfield.ac.uk/handle/1826/7935.

Full text

Abstract:

Ensuring the high of assets in water utilities is critically important and requires continuous improvement. This is due to the need to minimise risk of harm to human health and the environment from contaminated drinking water. Continuous improvement and innovation in water asset management are therefore, necessary and are driven by (i) increased regulatory requirements on serviceability; (ii) high maintenance costs, (iii) higher customer expectations, and (iv) enhanced environmental and health/safety requirements. High quality data on asset failures, maintenance, and operations are key requirements for developing reliability models. However, a literature search revealed that, in practice, there is sometimes limited data in water utilities - particularly for over-ground assets. Perhaps surprisingly, there is often a mismatch between the ambitions of sophisticated reliability tools and the availability of asset data water utilities are able to draw upon to implement them in practice. This research provides models to support decision-making in water utility asset management when there is limited data. Three approaches for assessing asset condition, maintenance effectiveness and selecting maintenance regimes for specific asset groups were developed. Expert elicitation was used to test and apply the developed decision-support tools. A major regional water utility in England was used as a case study to investigate and test the developed approaches. The new approach achieved improved precision in asset condition assessment (Figure 3–3a) - supporting the requirements of the UK Capital Maintenance Planning Common Framework. Critically, the thesis demonstrated that, on occasion, assets were sometimes misallocated by more than 50% between condition grades when using current approaches. Expert opinions were also sought for assessing maintenance effectiveness, and a new approach was tested with over-ground assets. The new approach’s value was demonstrated by the capability to account for finer measurements (as low as 10%) of maintenance effectiveness (Table 4-4). An asset maintenance regime selection approach was developed to support decision-making when data are sparse. The value of the approach is its versatility in selecting different regimes for different asset groups, and specifically accounting for the assets unique performance variables.

APA, Harvard, Vancouver, ISO, and other styles

15

Nziga, Jean-Pierre. "Incremental Sparse-PCA Feature Extraction For Data Streams." NSUWorks, 2015. http://nsuworks.nova.edu/gscis_etd/365.

Full text

Abstract:

Intruders attempt to penetrate commercial systems daily and cause considerable financial losses for individuals and organizations. Intrusion detection systems monitor network events to detect computer security threats. An extensive amount of network data is devoted to detecting malicious activities. Storing, processing, and analyzing the massive volume of data is costly and indicate the need to find efficient methods to perform network data reduction that does not require the data to be first captured and stored. A better approach allows the extraction of useful variables from data streams in real time and in a single pass. The removal of irrelevant attributes reduces the data to be fed to the intrusion detection system (IDS) and shortens the analysis time while improving the classification accuracy. This dissertation introduces an online, real time, data processing method for knowledge extraction. This incremental feature extraction is based on two approaches. First, Chunk Incremental Principal Component Analysis (CIPCA) detects intrusion in data streams. Then, two novel incremental feature extraction methods, Incremental Structured Sparse PCA (ISSPCA) and Incremental Generalized Power Method Sparse PCA (IGSPCA), find malicious elements. Metrics helped compare the performance of all methods. The IGSPCA was found to perform as well as or better than CIPCA overall in term of dimensionality reduction, classification accuracy, and learning time. ISSPCA yielded better results for higher chunk values and greater accumulation ratio thresholds. CIPCA and IGSPCA reduced the IDS dataset to 10 principal components as opposed to 14 eigenvectors for ISSPCA. ISSPCA is more expensive in terms of learning time in comparison to the other techniques. This dissertation presents new methods that perform feature extraction from continuous data streams to find the small number of features necessary to express the most data variance. Data subsets derived from a few important variables render their interpretation easier. Another goal of this dissertation was to propose incremental sparse PCA algorithms capable to process data with concept drift and concept shift. Experiments using WaveForm and WaveFormNoise datasets confirmed this ability. Similar to CIPCA, the ISSPCA and IGSPCA updated eigen-axes as a function of the accumulation ratio value, forming informative eigenspace with few eigenvectors.

APA, Harvard, Vancouver, ISO, and other styles

16

Bolbol, A. S. Z. "Inferring the transportation mode from sparse GPS data." Thesis, University College London (University of London), 2014. http://discovery.ucl.ac.uk/1448075/.

Full text

Abstract:

Understanding travel behaviour and travel demand is of constant importance to transportation communities and agencies in every country. Nowadays, attempts have been made to automatically infer the modes of transport from positional data (such as GPS data) to significantly reduce the cost in time and budget of conventional travel diary surveys. Some limitations, however, exist in the literature, in aspects of data collection (spatio-temporal sample distribution, duration of study, granularity of data, device type), data pre-processing (managing GPS errors, choice of modes, trip information generalisation, data labelling strategy), the classification method used and the choice of variables used for classification, track segmentation methods used (clustering techniques), and using transport network datasets. Therefore, this research attempts to fully understand these aspects and their effect on the process of inference of mode of transport. Furthermore, this research aims to solve a classification problem of sparse GPS data into different transportation modes (car, walk, cycle, underground, train and bus). To address the data collection issues, we conduct studies that aim to identify a representative sample distribution, study duration, and data collection rate that best suits the purpose of this study. As for the data pre-processing issues, we standardise guidelines for managing GPS errors and the required level of detail of the collected trip information. We also develop an online WebGIS-based travel diary that allows users to view, edit, and validate their track information to assure obtaining high quality information. After addressing the validation issues, we develop an inference framework to detect the mode of transport from the collected data. We first study the variables that could contribute positively to this classification, and statistically quantify their discriminatory power using ANOVA analysis. We then introduce a novel approach to carry out this inference using a framework based on Support Vector Machines (SVMs) classification. The classification process is followed by a segmentation phase that identifies stops, change points and indoor activity in GPS tracks using an innovative trajectory clustering technique developed for this purpose. The final phase of the framework develops a network matching technique that verifies the classification and segmentation results by testing their obedience to rules and restrictions of different transport networks. The framework is tested using coarse-grained GPS data, which has been avoided in previous studies, achieving almost 90% accuracy with a Kappa statistic reflecting almost perfect agreement.

APA, Harvard, Vancouver, ISO, and other styles

17

Sanyal, Joy. "Flood prediction and mitigation in data-sparse environments." Thesis, Durham University, 2013. http://etheses.dur.ac.uk/7711/.

Full text

Abstract:

In the last three decades many sophisticated tools have been developed that can accurately predict the dynamics of flooding. However, due to the paucity of adequate infrastructure, this technological advancement did not benefit ungauged flood-prone regions in the developing countries in a major way. The overall research theme of this dissertation is to explore the improvement in methodology that is essential for utilising recently developed flood prediction and management tools in the developing world, where ideal model inputs and validation datasets do not exist. This research addresses important issues related to undertaking inundation modelling at different scales, particularly in data-sparse environments. The results indicate that in order to predict dynamics of high magnitude stream flow in data-sparse regions, special attention is required on the choice of the model in relation to the available data and hydraulic characteristics of the event. Adaptations are necessary to create inputs for the models that have been primarily designed for areas with better availability of data. Freely available geospatial information of moderate resolution can often meet the minimum data requirements of hydrological and hydrodynamic models if they are supplemented carefully with limited surveyed/measured information. This thesis also explores the issue of flood mitigation through rainfall-runoff modelling. The purpose of this investigation is to assess the impact of land-use changes at the sub-catchment scale on the overall downstream flood risk. A key component of this study is also quantifying predictive uncertainty in hydrodynamic models based on the Generalised Likelihood Uncertainty Estimation (GLUE) framework. Detailed uncertainty assessment of the model outputs indicates that, in spite of using sparse inputs, the model outputs perform at reasonably low levels of uncertainty both spatially and temporally. These findings have the potential to encourage the flood managers and hydrologists in the developing world to use similar data sets for flood management.

APA, Harvard, Vancouver, ISO, and other styles

18

Taylor, Kye. "Sparse recovery and parameterization of manifold-valued data." Connect to online resource, 2008. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1453576.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Baur, Ulrike, and Peter Benner. "Gramian-Based Model Reduction for Data-Sparse Systems." Universitätsbibliothek Chemnitz, 2007. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200701952.

Full text

Abstract:

Model reduction is a common theme within the simulation, control and optimization of complex dynamical systems. For instance, in control problems for partial differential equations, the associated large-scale systems have to be solved very often. To attack these problems in reasonable time it is absolutely necessary to reduce the dimension of the underlying system. We focus on model reduction by balanced truncation where a system theoretical background provides some desirable properties of the reduced-order system. The major computational task in balanced truncation is the solution of large-scale Lyapunov equations, thus the method is of limited use for really large-scale applications. We develop an effective implementation of balancing-related model reduction methods in exploiting the structure of the underlying problem. This is done by a data-sparse approximation of the large-scale state matrix A using the hierarchical matrix format. Furthermore, we integrate the corresponding formatted arithmetic in the sign function method for computing approximate solution factors of the Lyapunov equations. This approach is well-suited for a class of practical relevant problems and allows the application of balanced truncation and related methods to systems coming from 2D and 3D FEM and BEM discretizations.

APA, Harvard, Vancouver, ISO, and other styles

20

Kang, Zhao. "LOW RANK AND SPARSE MODELING FOR DATA ANALYSIS." OpenSIUC, 2017. https://opensiuc.lib.siu.edu/dissertations/1366.

Full text

Abstract:

Nowadays, many real-world problems must deal with collections of high-dimensional data. High dimensional data usually have intrinsic low-dimensional representations, which are suited for subsequent analysis or processing. Therefore, finding low-dimensional representations is an essential step in many machine learning and data mining tasks. Low-rank and sparse modeling are emerging mathematical tools dealing with uncertainties of real-world data. Leveraging on the underlying structure of data, low-rank and sparse modeling approaches have achieved impressive performance in many data analysis tasks. Since the general rank minimization problem is computationally NP-hard, the convex relaxation of original problem is often solved. One popular heuristic method is to use the nuclear norm to approximate the rank of a matrix. Despite the success of nuclear norm minimization in capturing the low intrinsic-dimensionality of data, the nuclear norm minimizes not only the rank, but also the variance of matrix and may not be a good approximation to the rank function in practical problems. To mitigate above issue, this thesis proposes several nonconvex functions to approximate the rank function. However, It is often difficult to solve nonconvex problem. In this thesis, an optimization framework for nonconvex problem is further developed. The effectiveness of this approach is examined on several important applications, including matrix completion, robust principle component analysis, clustering, and recommender systems. Another issue associated with current clustering methods is that they work in two separate steps including similarity matrix computation and subsequent spectral clustering. The learned similarity matrix may not be optimal for subsequent clustering. Therefore, a unified algorithm framework is developed in this thesis. To capture the nonlinear relations among data points, we formulate this method in kernel space. Furthermore, the obtained continuous spectral solutions could severely deviate from the true discrete cluster labels, a discrete transformation is further incorporated in our model. Finally, our framework can simultaneously learn similarity matrix, kernel, and discrete cluster labels. The performance of the proposed algorithms is established through extensive experiments. This framework can be easily extended to semi-supervised classification.

APA, Harvard, Vancouver, ISO, and other styles

21

Zeng, Yaohui. "Scalable sparse machine learning methods for big data." Diss., University of Iowa, 2017. https://ir.uiowa.edu/etd/6021.

Full text

Abstract:

Sparse machine learning models have become increasingly popular in analyzing high-dimensional data. With the evolving era of Big Data, ultrahigh-dimensional, large-scale data sets are constantly collected in many areas such as genetics, genomics, biomedical imaging, social media analysis, and high-frequency finance. Mining valuable information efficiently from these massive data sets requires not only novel statistical models but also advanced computational techniques. This thesis focuses on the development of scalable sparse machine learning methods to facilitate Big Data analytics. Built upon the feature screening technique, the first part of this thesis proposes a family of hybrid safe-strong rules (HSSR) that incorporate safe screening rules into the sequential strong rule to remove unnecessary computational burden for solving the \textit{lasso-type} models. We present two instances of HSSR, namely SSR-Dome and SSR-BEDPP, for the standard lasso problem. We further extend SSR-BEDPP to the elastic net and group lasso problems to demonstrate the generalizability of the hybrid screening idea. In the second part, we design and implement an R package called \texttt{biglasso} to extend the lasso model fitting to Big Data in R. Our package \texttt{biglasso} utilizes memory-mapped files to store the massive data on the disk, only reading data into memory when necessary during model fitting, and is thus able to handle \textit{data-larger-than-RAM} cases seamlessly. Moreover, it's built upon our redesigned algorithm incorporated with the proposed HSSR screening, making it much more memory- and computation-efficient than existing R packages. Extensive numerical experiments with synthetic and real data sets are conducted in both parts to show the effectiveness of the proposed methods. In the third part, we consider a novel statistical model, namely the overlapping group logistic regression model, that allows for selecting important groups of features that are associated with binary outcomes in the setting where the features belong to overlapping groups. We conduct systematic simulations and real-data studies to show its advantages in the application of genetic pathway selection. We implement an R package called \texttt{grpregOverlap} that has HSSR screening built in for fitting overlapping group lasso models.

APA, Harvard, Vancouver, ISO, and other styles

22

Labusch, Kai [Verfasser]. "Soft-competitive learning of sparse data models / Kai Labusch." Lübeck : Zentrale Hochschulbibliothek Lübeck, 2012. http://d-nb.info/1019906707/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Evans, Jason Peter, and jason evans@yale edu. "Modelling Climate - Surface Hydrology Interactions in Data Sparse Areas." The Australian National University. Centre for Resource and Environmental Studies, 2000. http://thesis.anu.edu.au./public/adt-ANU20020313.032142.

Full text

Abstract:

The interaction between climate and land-surface hydrology is extremely important in relation to long term water resource planning. This is especially so in the presence of global warming and massive land use change, issues which seem likely to have a disproportionate impact on developing countries. This thesis develops tools aimed at the study and prediction of climate effects on land-surface hydrology (in particular streamflow), which require a minimum amount of site specific data. This minimum data requirement allows studies to be performed in areas that are data sparse, such as the developing world. ¶ A simple lumped dynamics-encapsulating conceptual rainfall-runoff model, which explicitly calculates the evaporative feedback to the atmosphere, was developed. It uses the linear streamflow routing module of the rainfall-runoff model IHACRES, with a new non-linear loss module based on the Catchment Moisture Deficit accounting scheme, and is referred to as CMD-IHACRES. In this model, evaporation can be calculated using a number of techniques depending on the data available, as a minimum, one to two years of precipitation, temperature and streamflow data are required. The model was tested on catchments covering a large range of hydroclimatologies and shown to estimate streamflow well. When tested against evaporation data the simplest technique was found to capture the medium to long term average well but had difficulty reproducing the short-term variations. ¶ A comparison of the performance of three limited area climate models (MM5/BATS, MM5/SHEELS and RegCM2) was conducted in order to quantify their ability to reproduce near surface variables. Components of the energy and water balance over the land surface display considerable variation among the models, with no model performing consistently better than the other two. However, several conclusions can be made. The MM5 longwave radiation scheme performed worse than the scheme implemented in RegCM2. Estimates of runoff displayed the largest variations and differed from observations by as much as 100%. The climate models exhibited greater variance than the observations for almost all the energy and water related fluxes investigated. ¶ An investigation into improving these streamflow predictions by utilizing CMD-IHACRES was conducted. Using CMD-IHACRES in an 'offline' mode greatly improved the streamflow estimates while the simplest evaporation technique reproduced the evaporative time series to an accuracy comparable to that obtained from the limited area models alone. The ability to conduct a climate change impact study using CMD-IHACRES and a stochastic weather generator is also demonstrated. These results warrant further investigation into incorporating the rainfall-runoff model CMD-IHACRES in a fully coupled 'online' approach.

APA, Harvard, Vancouver, ISO, and other styles

24

Lu, Xuebin. "Fast computation of sparse data cubes in its applications." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0009/MQ61455.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Mirshahi, Babak. "Hydrological modelling in data-sparse snow-affected semiarid areas." Thesis, Imperial College London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.528304.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Burroughes, Janet Eirlys. "The synthesis of estuarine bathymetry from sparse sounding data." Thesis, University of Plymouth, 2001. http://hdl.handle.net/10026.1/1887.

Full text

Abstract:

The two aims of the project involved: 1. Devising a system for prediction o f areas of bathymetric change within the Fal estuary 2. Formulating and evaluating a method for interpolating single beam acoustic bathymetry to avoid artefacts o f interpolation. In order to address these aims, sources of bathymetric data for the Fal estuary were identified as Truro Harbour Office, Cornwall County Council and the Environment Agency. The data collected from these sources included red wavelength Lidar, aerial photography and single beam acoustic bathymetry from a number of different years. These data were input into a Geographic Information System (GIS) and assessed for suitability for the purposes o f data comparison and hence assessment of temporal trends in bathymetry within the estuary Problems encountered during mterpolation of the acoustic bathymetry resulted in the later aim of the project, to formulate an interpolation system suitable for interpolation of the single beam, bathymetric data in a realistic way, avoiding serious artefacts of interpolation. This aim was met, successfully, through the following processes: 1. An interpolation system was developed, using polygonal zones, bounded by channels and coastlines, to prevent interpolation across these boundaries. This system, based on Inverse Distance Weighting (IDW) interpolation, was referred to as Zoned Inverse Distance Weighting (ZIDW). 2. ZIDW was found, by visual inspection, to eliminate the interpolation artefacts described above. 3. The processes of identification of sounding lines and charmels, and the allocation of soundings and output grid cells to polygons, were successfully automated to allow ZIDW to be applied to large and multiple data sets. Manual intervention was maintained for processes performed most successfully by the human brain to optimise the results o f ZIDW. 4. To formalise the theory of ZIDW it was applied to a range of idealised, mathematically defined chaimels. For simple straight and regular curved, mathematical channels interpolation by the standard TIN method was found to perform as well as ZIDW. 5. Investigation of sinusoidal channels within a rectangular estuary, however, revealed that the TIN method begins to produce serious interpolation artefacts where sounding lines are not parallel to the centre lines o f channels and ridges. Hence, overall ZIDW was determined mathematically to represent the optimum method o f interpolation for single beam, bathymelric data. 6. Finally, ZIDW was refined, using data from the Humber and Gironde estuaries, to achieve universal applicability for interpolation of single beam, echo soimding data from any estuary. 7. The refinements involved allowance for non-continuous, flood and ebb type charmels; consideration of the effects of the scale of the estuary; smoothing of the channels using cubic splines; interpolation using a 'smart' ellipse and the option to reconstruct sounding lines from data that had previously been re-ordered.

APA, Harvard, Vancouver, ISO, and other styles

27

Spaniol, Jutta. "Synthesis of fractal-like surfaces from sparse data bases." Thesis, University of Exeter, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.335017.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Dixon, Samuel G. "Seasonal forecasting of reservoir inflows in data sparse regions." Thesis, Loughborough University, 2017. https://dspace.lboro.ac.uk/2134/33524.

Full text

Abstract:

Management of large, transboundary river systems can be politically and strategically problematic. Accurate flow forecasting based on public domain data offers the potential for improved resource allocation and infrastructure management. This study investigates the scope for reservoir inflow forecasting in data sparse regions using public domain information. Four strategically important headwater reservoirs in Central Asia are used to pilot forecasting methodologies (Toktogul, Andijan and Kayrakkum in Kyrgyzstan and Nurek in Tajikistan). Two approaches are developed. First, statistical forecasting of monthly inflow is undertaken using relationships with satellite precipitation estimates as well as reanalysis precipitation and temperature products. Second, mean summer inflows to reservoirs are conditioned on the tercile of preceding winter large scale climate modes (El Niño Southern Oscillation, North Atlantic Oscillation, or Indian Ocean Dipole). The transferability of both approaches is evaluated through implementation to a basin in Morocco. A methodology for operationalising seasonal forecasts of inflows to Nurek reservoir in Tajikistan is also presented. The statistical models outperformed the long-term average mean monthly inflows into Toktogul and Andijan reservoirs at lead times of 1-4 months using operationally available predictors. Stratifying models to forecast monthly inflows for only summer months (April-September) improved skill over long term average mean monthly inflows. Individual months Niño 3.4 during October-January were significantly (p < 0.01) correlated to following mean summer inflows Toktogul, Andijan and Nurek reservoirs during the period 1941-1980. Significant differences (p < 0.01) occurred in summer inflows into all reservoirs following opposing phases of winter Niño 3.4 during the period 1941-1980. Over the period 1941-2016 (1993-1999 missing), there exists only a 22% chance of positive summer inflow anomalies into Nurek reservoir following November-December La Niña conditions. Cross validated model skill assessed using the Heidke Hit Proportion outperforms chance, with a hit rate of 51-59% depending upon the period of record used. This climate mode forecasting approach could be extended to natural hazards (e.g. avalanches and mudflows) or to facilitate regional electricity hedging (between neighbouring countries experiencing reduced/increased demand). Further research is needed to evaluate the potential for forecasting winter energy demand, potentially reducing the impact of winter energy crises across the region.

APA, Harvard, Vancouver, ISO, and other styles

29

Nader, Babak. "Parallel solution of sparse linear systems." Full text open access at:, 1987. http://content.ohsu.edu/u?/etd,138.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Martinez, Juan Enrique Castorera. "Remote-Sensed LIDAR Using Random Sampling and Sparse Reconstruction." International Foundation for Telemetering, 2011. http://hdl.handle.net/10150/595760.

Full text

Abstract:

ITC/USA 2011 Conference Proceedings / The Forty-Seventh Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2011 / Bally's Las Vegas, Las Vegas, Nevada
In this paper, we propose a new, low complexity approach for the design of laser radar (LIDAR) systems for use in applications in which the system is wirelessly transmitting its data from a remote location back to a command center for reconstruction and viewing. Specifically, the proposed system collects random samples in different portions of the scene, and the density of sampling is controlled by the local scene complexity. The range samples are transmitted as they are acquired through a wireless communications link to a command center and a constrained absolute-error optimization procedure of the type commonly used for compressive sensing/sampling is applied. The key difficulty in the proposed approach is estimating the local scene complexity without densely sampling the scene and thus increasing the complexity of the LIDAR front end. We show here using simulated data that the complexity of the scene can be accurately estimated from the return pulse shape using a finite moments approach. Furthermore, we find that such complexity estimates correspond strongly to the surface reconstruction error that is achieved using the constrained optimization algorithm with a given number of samples.

APA, Harvard, Vancouver, ISO, and other styles

31

Sävhammar, Simon. "Uniform interval normalization : Data representation of sparse and noisy data sets for machine learning." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19194.

Full text

Abstract:

The uniform interval normalization technique is proposed as an approach to handle sparse data and to handle noise in the data. The technique is evaluated transforming and normalizing the MoodMapper and Safebase data sets, the predictive capabilities are compared by forecasting the data set with aLSTM model. The results are compared to both the commonly used MinMax normalization technique and MinMax normalization with a time2vec layer. It was found the uniform interval normalization performed better on the sparse MoodMapper data set, and the denser Safebase data set. Future works consist of studying the performance of uniform interval normalization on other data sets and with other machine learning models.

APA, Harvard, Vancouver, ISO, and other styles

32

Vogetseder, Georg. "Functional Analysis of Real World Truck Fuel Consumption Data." Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-1148.

Full text

Abstract:

This thesis covers the analysis of sparse and irregular fuel consumption data of long

distance haulage articulate trucks. It is shown that this kind of data is hard to analyse with multivariate as well as with functional methods. To be able to analyse the data, Principal Components Analysis through Conditional Expectation (PACE) is used, which enables the use of observations from many trucks to compensate for the sparsity of observations in order to get continuous results. The principal component scores generated by PACE, can then be used to get rough estimates of the trajectories for single trucks as well as to detect outliers. The data centric approach of PACE is very useful to enable functional analysis of sparse and irregular data. Functional analysis is desirable for this data to sidestep feature extraction and enabling a more natural view on the data.

APA, Harvard, Vancouver, ISO, and other styles

33

Kraus, Katrin. "On the Measurement of Model Fit for Sparse Categorical Data." Doctoral thesis, Uppsala universitet, Statistiska institutionen, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-173768.

Full text

Abstract:

This thesis consists of four papers that deal with several aspects of the measurement of model fit for categorical data. In all papers, special attention is paid to situations with sparse data. The first paper concerns the computational burden of calculating Pearson's goodness-of-fit statistic for situations where many response patterns have observed frequencies that equal zero. A simple solution is presented that allows for the computation of the total value of Pearson's goodness-of-fit statistic when the expected frequencies of response patterns with observed frequencies of zero are unknown. In the second paper, a new fit statistic is presented that is a modification of Pearson's statistic but that is not adversely affected by response patterns with very small expected frequencies. It is shown that the new statistic is asymptotically equivalent to Pearson's goodness-of-fit statistic and hence, asymptotically chi-square distributed. In the third paper, comprehensive simulation studies are conducted that compare seven asymptotically equivalent fit statistics, including the new statistic. Situations that are considered concern both multinomial sampling and factor analysis. Tests for the goodness-of-fit are conducted by means of the asymptotic and the bootstrap approach both under the null hypothesis and when there is a certain degree of misfit in the data. Results indicate that recommendations on the use of a fit statistic can be dependent on the investigated situation and on the purpose of the model test. Power varies substantially between the fit statistics and the cause of the misfit of the model. Findings indicate further that the new statistic proposed in this thesis shows rather stable results and compared to the other fit statistics, no disadvantageous characteristics of the fit statistic are found. Finally, in the fourth paper, the potential necessity of determining the goodness-of-fit by two sided model testing is adverted. A simulation study is conducted that investigates differences between the one sided and the two sided approach of model testing. Situations are identified for which two sided model testing has advantages over the one sided approach.

APA, Harvard, Vancouver, ISO, and other styles

34

Paiement, Adeline. "Integrated registration, segmentation, and interpolation for 3D/4D sparse data." Thesis, University of Bristol, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.649370.

Full text

Abstract:

We address the problem of object modelling from 3D and 4D sparse data acquired as different sequences which are misaligned with respect to each other. Such data may result from various imaging modalities and can therefore present very diverse spatial configurations and appearances. We focus on medical tomographic data, made up of sets of 2D slices having arbitrary positions and orientations, and which may have different gains and contrasts even within the same dataset. The analysis of such tomographic data is essential for establishing a diagnosis or planning surgery. Modelling from sparse and misaligned data requires solving the three inherently related problems of registration, segmentation, and interpolation. We propose a new method to integrate these stages in a level set framework. Registration is particularly challenging by the limited number of intersections present in a sparse dataset, and interpolation has to handle images that may have very different appearances. Hence, registration and interpolation exploit segmentation information, rather than pixel intensities, for increased robustness and accuracy. We achieve this by first introducing a new level set scheme based on the interpolation of the level set function by radial basis functions. This new scheme can inherently handle sparse data, and is more numerically stable and robust to noise than the classical level set. We also present a new registration algorithm based on the level set method, which is robust to local minima and can handle sparse data that have only a limited number of intersections. Then, we integrate these two methods into the same level set framework. The proposed method is validated quantitatively and subjectively on artificial data and MRI and CT scans. It is compared against a state-of-the-art, sequential method comprising traditional mutual information based registration, image interpolation, and 3D or 4D segmentation of the registered and interpolated volume. In our experiments, the proposed framework yields similar segmentation results to the sequential approach, but provides a more robust and accurate registration and interpolation. In particular, the registration is more robust to limited intersections in the data and to local minima. The interpolation is more satisfactory in cases of large gaps, due to the method taking into account the global shape of the object, and it recovers better topologies at the extremities of the shapes where the objects disappear from the image slices. As a result, the complete integrated framework provides more satisfactory shape reconstructions than the sequential approach.

APA, Harvard, Vancouver, ISO, and other styles

35

Brunet, Camille. "Sparse and discriminative clustering for complex data : application to cytology." Thesis, Evry-Val d'Essonne, 2011. http://www.theses.fr/2011EVRY0018/document.

Full text

Abstract:

Les thèmes principaux de ce mémoire sont la parcimonie et la discrimination pour la modélisation de données complexes. Dans un première partie de ce mémoire, nous nous plaçons dans un contexte de modèle de mélanges gaussiens: nous introduisons une nouvelle famille de modèles probabilistes qui simultanément classent et trouvent un espace discriminant tel que cet espace discrimine au mieux les groupes. Une famille de 12 modèles est introduite et se base sur deux idées clefs: tout d'abord, les données réelles vivent dans un sous-espace latent de dimension intrinsèque plus petite que celle de l'espace observé; deuxièmement, un sous-espace de dimensions K-1 est suffisant pour discriminer K groupes; enfin, l'espace observé et celui latent sont liés par une transformation linéaire. Une procédure d'estimation, appelée Fisher-EM, est proposée et améliore la plupart du temps les performances de clustering grâce à l'utilisation du sous-espace discriminant. Puisque chaque axe engendrant le sous-espace discriminant est une combinaison linéaire des variables d'origine, nous avons proposé trois méthodes différentes basées sur des critères pénalisés afin de faciliter l'interprétation des résultats. En particulier, ces méthodes permettent d'introduire de la parcimonie directement dans les composantes de la matrice de projection et peut se traduite comme une étape de sélection de variables discriminantes pour la classification. Dans une seconde partie, nous nous plaçons dans le contexte de la sériation. Nous proposons une mesure de dissimilarités basée sur le voisinage commun qui permet d'introduire de la parcimonie dans les données. Une procédure algorithmique appelée l'algorithme PB-Clus est introduite et permet d'obtenir une représentation diagonale par blocs des données. Cet outil permet de révéler la structure intrinsèque des données même dans le cas de données fortement bruitées ou de recouvrement de groupes. Ces deux méthodes ont été validées dans le cadre d'une application biologique basée sur la détection de cellules cancéreuses
The main topics of this manuscript are sparsity and discrimination for modeling complex data. In a first part, we focus on the GMM context: we introduce a new family of probabilistic models which both clusters and finds a discriminative subspace chosen such as it best discriminates the groups. A family of 12 DLM models is introduced and is based on two three-ideas: firstly, the actual data live in a latent subspace with an intrinsic dimension lower than the dimension of the observed space; secondly, a subspace of K-1 dimensions is theoretically sufficient to discriminate K groups; thirdly, the observation and the latent spaces are linked by a linear transformation. An estimation procedure, named Fisher-EM is proposed and improves, most of the time, clustering performances owing to the use of a discriminative subspace. As each axis, spanning the discriminative subspace, is a linear combination of all original variables, we therefore proposed 3 different methods based on a penalized criterion in order to ease the interpretation results. In particular, it allows to introduce sparsity directly in the loadings of the projection matrix which enables also to make variable selection for clustering. In a second part, we deal with the seriation context. We propose a dissimilarity measure based on a common neighborhood which allows to deal with noisy data and overlapping groups. A forward stepwise seriation algorithm, called the PB-Clus algorithm, is introduced and allows to obtain a block representation form of the data. This tool enables to reveal the intrinsic structure of data even in the case of noisy data, outliers, overlapping and non-Gaussian groups. Both methods has been validated on a biological application based on the cancer cell detection

APA, Harvard, Vancouver, ISO, and other styles

36

Zhao, Jingjun. "Bayesian Sparse Factor Analysis of High Dimensional Gene Expression Data." Thesis, North Dakota State University, 2019. https://hdl.handle.net/10365/31693.

Full text

Abstract:

This work closely studied fundamental techniques of Bayesian sparse Factor Analysis model - constrained Least Square regression, Bayesian Lasso regression, and some popular sparsity-inducing priors. In Appendix A, we introduced each of the fundamental techniques in a coherent manner and provided detailed proof for important formulas and definitions. We consider provided introduction and detailed proof, which are very helpful in learning Bayesian sparse Factor Analysis, as a contribution of this work. We also systematically studied a computationally tractable biclustering approach in identifying co-regulated genes, BicMix, by proving all point estimates of the parameters and by running the method on both simulated data sets and a real high-dimensional gene expression data set. Missed derivation of all point estimates in BicMix has been provided for better understanding variational expectation maximization (VEM) algorithm. The performance of the method for identifying true biclusters has been analyzed using the experimental results.

APA, Harvard, Vancouver, ISO, and other styles

37

Roussos, Evangelos. "Bayesian methods for sparse data decomposition and blind source separation." Thesis, University of Oxford, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.589766.

Full text

Abstract:

In an exploratory approach to data analysis, it is often useful to consider the observations as generated from a set of latent generators or 'sources' via a generally unknown mapping. Reconstructing sources from their mixtures is an extremely ill-posed problem in general. However, solutions to such inverse problems can, in many cases, be achieved by incorporating prior knowledge about the problem, captured in the form of constraints. This setting is a natural candidate for the application of the Bayesian method- ology, allowing us to incorporate "soft" constraints in a natural manner. This Thesis proposes the use of sparse statistical decomposition methods for ex- ploratory analysis of datasets. We make use of the fact that many natural signals have a sparse representation in appropriate signal dictionaries. The work described in this Thesis is mainly driven by problems in the analysis of large datasets, such as those from functional magnetic resonance imaging of the brain for the neuro-scientific goal of extracting relevant 'maps' from the data. We first propose Bayesian Iterative Thresholding, a general method for solv- ing blind linear inverse problems under sparsity constraints, and we apply it to the problem of blind source separation. The algorithm is derived by maximiz- ing a variational lower-bound on the likelihood. The algorithm generalizes the recently proposed method of Iterative Thresholding. The probabilistic view en- ables us to automatically estimate various hyperparameters, such as those that control the shape of the prior and the threshold, in a principled manner. We then derive an efficient fully Bayesian sparse matrix factorization model for exploratory analysis and modelling of spatio-temporal data such as fMRI. We view sparse representation as a problem in Bayesian inference, following a ma- chine learning approach, and construct a structured generative latent-variable model employing adaptive sparsity-inducing priors. The construction allows for automatic complexity control and regularization as well as denoising. The performance and utility of the proposed algorithms is demonstrated on a variety of experiments using both simulated and real datasets. Experimental results with benchmark datasets show that the proposed algorithms outper- form state-of-the-art tools for model-free decompositions such as independent component analysis.

APA, Harvard, Vancouver, ISO, and other styles

38

Chen, Yujia, Yang Lou, Matthew A. Kupinski, and Mark A. Anastasio. "Task-based data-acquisition optimization for sparse image reconstruction systems." SPIE-INT SOC OPTICAL ENGINEERING, 2017. http://hdl.handle.net/10150/625209.

Full text

Abstract:

Conventional wisdom dictates that imaging hardware should be optimized by use of an ideal observer (TO) that exploits full statistical knowledge of the class of objects to be imaged, without consideration of the reconstruction method to be employed. However, accurate and tractable models of the complete object statistics are often difficult to determine in practice. Moreover, in imaging systems that employ compressive sensing concepts, imaging hardware and (sparse) image reconstruction are innately coupled technologies. We have previously proposed a sparsity-driven ideal observer (SDIO) that can be employed to optimize hardware by use of a stochastic object model that describes object sparsity. The SDIO and sparse reconstruction method can therefore be "matched" in the sense that they both utilize the same statistical information regarding the class of objects to be imaged. To efficiently compute SDIO performance, the posterior distribution is estimated by use of computational tools developed recently for variational Bayesian inference. Subsequently, the SDIO test statistic can be computed semi-analytically. The advantages of employing the SDIO instead of a Hotelling observer are systematically demonstrated in case studies in which magnetic resonance imaging (MRI) data acquisition schemes are optimized for signal detection tasks.

APA, Harvard, Vancouver, ISO, and other styles

39

Haque, Sardar Anisul, and University of Lethbridge Faculty of Arts and Science. "A computational study of sparse matrix storage schemes." Thesis, Lethbridge, Alta. : University of Lethbridge, Deptartment of Mathematics and Computer Science, 2008, 2008. http://hdl.handle.net/10133/777.

Full text

Abstract:

The efficiency of linear algebra operations for sparse matrices on modern high performance computing system is often constrained by the available memory bandwidth. We are interested in sparse matrices whose sparsity pattern is unknown. In this thesis, we study the efficiency of major storage schemes of sparse matrices during multiplication with dense vector. A proper reordering of columns or rows usually results in reduced memory traffic due to the improved data reuse. This thesis also proposes an efficient column ordering algorithm based on binary reflected gray code. Computational experiments show that this ordering results in increased performance in computing the product of a sparse matrix with a dense vector.
xi, 76 leaves : ill. ; 29 cm.

APA, Harvard, Vancouver, ISO, and other styles

40

Adzemovic, Haris, and Alexander Sandor. "Comparison of user and item-based collaborative filtering on sparse data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209445.

Full text

Abstract:

Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased, Netflix presents you with shows and movies you might enjoy based on your interactions with the platform and Facebook serves personalized ads to every user based on gathered browsing information. These systems are based on shared similarities and there are several ways to develop and model them. This study compares two methods, user and item-based filtering in k nearest neighbours systems.The methods are compared on how much they deviate from the true answer when predicting user ratings of movies based on sparse data. The study showed that none of the methods could be considered objectively better than the other and that the choice of system should be based on the data set.
Idag används rekommendationssystem extensivt inom flera områden för att hjälpa användare och konsumenter i deras val. Amazon rekommenderar böcker baserat på vad du tittat på och köpt, Netflix presenterar serier och filmer du antagligen kommer gilla baserat på interaktioner med plattformen och Facebook visar personaliserad, riktad reklam för varje enskild användare baserat på tidigare surfvanor. Dessa system är baserade på delade likheter och det finns flera sätt att utveckla och modellera dessa på. I denna rapport jämförs två metoder, användar- och objektbaserad filtrering i k nearest neighbours system. Metoderna jämförs på hur mycket de avviker från det sanna svaret när de försöker förutse användarbetyg på filmer baserat på gles data. Studien visade att man ej kan peka ut någon metod som objektivt bättre utan att val av metod bör baseras på datasetet.

APA, Harvard, Vancouver, ISO, and other styles

41

Wang, Zi. "Sparse multivariate models for pattern detection in high-dimensional biological data." Thesis, Imperial College London, 2015. http://hdl.handle.net/10044/1/25762.

Full text

Abstract:

Recent advances in technology have made it possible and affordable to collect biological data of unprecedented size and complexity. While analysing such data, traditional statistical methods and machine learning algorithms suffer from the curse of dimensionality. Parsimonious models, which may refer to parsimony in model structure and/or model parameters, have been shown to improve both biological interpretability of the model and the generalisability to new data. In this thesis we are concerned with model selection in both supervised and unsupervised learning tasks. For supervised learnings, we propose a new penalty called graphguided group lasso (GGGL) and employ this penalty in penalised linear regressions. GGGL is able to integrate prior structured information with data mining, where variables sharing similar biological functions are collected into groups and the pairwise relatedness between groups are organised into a network. Such prior information will guide the selection of variables that are predictive to a univariate response, so that the model selects variable groups that are close in the network and important variables within the selected groups. We then generalise the idea of incorporating network-structured prior knowledge to association studies consisting of multivariate predictors and multivariate responses and propose the network-driven sparse reduced-rank regression (NsRRR). In NsRRR, pairwise relatedness between predictors and between responses are represented by two networks, and the model identifies associations between a subnetwork of predictors and a subnetwork of responses such that both subnetworks tend to be connected. For unsupervised learning, we are concerned with a multi-view learning task in which we compare the variance of high-dimensional biological features collected from multiple sources which are referred as “views”. We propose the sparse multi-view matrix factorisation (sMVMF) which is parsimonious in both model structure and model parameters. sMVMF can identify latent factors that regulate variability shared across all views and the variability which is characteristic to a specific view, respectively. For each novel method, we also present simulation studies and an application on real biological data to illustrate variable selection and model interpretability perspectives.

APA, Harvard, Vancouver, ISO, and other styles

42

Picciau, Andrea. "Concurrency and data locality for sparse linear algebra on modern processors." Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/58884.

Full text

Abstract:

Graphics processing units (GPUs) are used as accelerators for algorithms in which the same instructions are carried out on different data. Algorithms for sparse linear algebra can achieve good performance on GPU, although they tend to have an irregular pattern of accesses to memory. The performance of these algorithms is highly dependent on input data. In fact, the parallelism these algorithms can achieve is limited by the opportunities for concurrency given by the data. Focusing on the solution of sparse riangular linear systems of equations, this thesis shows that a good partitioning of the data and a good scheduling of the computation can greatly improve performance on GPUs. For this class of algorithms, a partition of the data that maximises concurrency in the execution does not necessarily achieve the best performance. Instead, improving data locality by reducing concurrency reduces the latency of memory access and consequently the execution time. First, this work characterises the problem formally using graph theory and performance models. Then, algorithms that can be used effectively to partition the data are described. These algoritms aim to balance concurrency and data locality automatically. This approach is evaluated experimentally on the solution of linear equations with the preconditioned conjugate gradient method. Also, the thesis shows that the proposed approach can be used in the case when a matrix changes during the execution of an algorithm from one iteration to the other, like in the simplex method. In this case, the approach proposed in this thesis allows to update the partition of the matrix from one iteration to the other. Finally, the algorithms and performance models developed in the thesis are used to discuss the limitations of the acceleration of the simplex method with GPUs.

APA, Harvard, Vancouver, ISO, and other styles

43

Postigo, Smura Michel Alexander. "Cluster analysis on sparse customer data on purchase of insurance products." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-249558.

Full text

Abstract:

This thesis work aims at performing a cluster analysis on customer data of insurance products. Three different clustering algorithms are investigated. These are K-means (center-based clustering), Two-Level clustering (SOM and Hierarchical clustering) and HDBSCAN (density-based clustering). The input to the algorithms is a high-dimensional and sparse data set. It contains information about the customers previous purchases, how many of a product they have bought and how much they have paid. The data set is partitioned in four different subsets done with domain knowledge and also preprocessed by normalizing respectively scaling before running the three different cluster algorithms on it. A parameter search is performed for each of the cluster algorithms and the best clustering is compared with the other results. The best is measured by the highest average silhouette index. The results indicates that all of the three algorithms performs approximately equally good, with single exceptions. However, it can be stated that the algorithm showing best general results is K-means on scaled data sets. The different preprocessings and partitions of the data impacts the results in different ways and this shows that it is important to preprocess the input data in several ways when performing a cluster analysis.
Målet med detta examensarbete är att utföra en klusteranalys på kunddata av försäkringsprodukter. Tre olika klusteralgoritmer undersöks. Dessa är Kmeans (center-based clustering), Two-Level clustering (SOM och Hierarchical clustering) och HDBSCAN (density-based clustering). Input till algoritmerna är ett högdimensionellt och glest dataset. Det innhåller information om kundernas tidigare köp, hur många produkter de har köpt och hur mycket de har betalat. Datasetet delas upp i fyra delmängder med kunskap inom området och förarbetas också genom att normaliseras respektive skalas innan klustringsalgoritmerna körs på det. En parametersökning utförs för dem tre olika algoritmerna och den bästa klustringen jämförs med de andra resultaten. Den bästa algoritmen bestäms genom att beräkna the högsta silhouette index-medelvärdet. Resultaten indikerar att alla tre algoritmerna levererar ungefärligt lika bra resultat, med enstaka undantag. Dock, kan det bekräftas att algoritmen som visar bäst resultat överlag är K-means på skalade dataset. De olika förberedelserna och uppdelningarna av datasetet påverkar resultaten på olika sätt och detta tyder på vikten av att förbereda input datat på flera sätt när en klusteranalys utförs.

APA, Harvard, Vancouver, ISO, and other styles

44

Castleberry, Alissa. "Integrated Analysis of Multi-Omics Data Using Sparse Canonical Correlation Analysis." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu15544898045976.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Das, Debasish. "Bayesian Sparse Regression with Application to Data-driven Understanding of Climate." Diss., Temple University Libraries, 2015. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/313587.

Full text

Abstract:

Computer and Information Science
Ph.D.
Sparse regressions based on constraining the L1-norm of the coefficients became popular due to their ability to handle high dimensional data unlike the regular regressions which suffer from overfitting and model identifiability issues especially when sample size is small. They are often the method of choice in many fields of science and engineering for simultaneously selecting covariates and fitting parsimonious linear models that are better generalizable and easily interpretable. However, significant challenges may be posed by the need to accommodate extremes and other domain constraints such as dynamical relations among variables, spatial and temporal constraints, need to provide uncertainty estimates and feature correlations, among others. We adopted a hierarchical Bayesian version of the sparse regression framework and exploited its inherent flexibility to accommodate the constraints. We applied sparse regression for the feature selection problem of statistical downscaling of the climate variables with particular focus on their extremes. This is important for many impact studies where the climate change information is required at a spatial scale much finer than that provided by the global or regional climate models. Characterizing the dependence of extremes on covariates can help in identification of plausible causal drivers and inform extremes downscaling. We propose a general-purpose sparse Bayesian framework for covariate discovery that accommodates the non-Gaussian distribution of extremes within a hierarchical Bayesian sparse regression model. We obtain posteriors over regression coefficients, which indicate dependence of extremes on the corresponding covariates and provide uncertainty estimates, using a variational Bayes approximation. The method is applied for selecting informative atmospheric covariates at multiple spatial scales as well as indices of large scale circulation and global warming related to frequency of precipitation extremes over continental United States. Our results confirm the dependence relations that may be expected from known precipitation physics and generates novel insights which can inform physical understanding. We plan to extend our model to discover covariates for extreme intensity in future. We further extend our framework to handle the dynamic relationship among the climate variables using a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP). The extended model can achieve simultaneous clustering and discovery of covariates within each cluster. Moreover, the a priori knowledge about association between pairs of data-points is incorporated in the model through must-link constraints on a Markov Random Field (MRF) prior. A scalable and efficient variational Bayes approach is developed to infer posteriors on regression coefficients and cluster variables.
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

46

Rajamani, Kumar T. "Three dimensional surface extrapolation from sparse data using deformable bone models /." Bern : [s.n.], 2006. http://opac.nebis.ch/cgi-bin/showAbstract.pl?sys=000279098.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Lennartz, Carolin [Verfasser], and Jürgen [Akademischer Betreuer] Hennig. "Inference of sparse cerebral connectivity from high temporal resolution fMRI data." Freiburg : Universität, 2020. http://d-nb.info/1216826684/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Zabriskie, Brinley. "Methods for Meta–Analyses of Rare Events, Sparse Data, and Heterogeneity." DigitalCommons@USU, 2019. https://digitalcommons.usu.edu/etd/7491.

Full text

Abstract:

The vast and complex wealth of information available to researchers often leads to a systematic review, which involves a detailed and comprehensive plan and search strategy with the goal of identifying, appraising, and synthesizing all relevant studies on a particular topic. A meta–analysis, conducted ideally as part of a comprehensive systematic review, statistically synthesizes evidence from multiple independent studies to produce one overall conclusion. The increasingly widespread use of meta–analysis has led to growing interest in meta–analytic methods for rare events and sparse data. Conventional approaches tend to perform very poorly in such settings. Recent work in this area has provided options for sparse data, but these are still often hampered when heterogeneity across the available studies differs based on treatment group. Heterogeneity arises when participants in a study are more correlated than participants across studies, often stemming from differences in the administration of the treatment, study design, or measurement of the outcome. We propose several new exact methods that accommodate this common contingency, providing more reliable statistical tests when such patterns on heterogeneity are observed. First, we develop a permutation–based approach that can also be used as a basis for computing exact confidence intervals when estimating the effect size. Second, we extend the permutation–based approach to the network meta–analysis setting. Third, we develop a new exact confidence distribution approach for effect size estimation. We show these new methods perform markedly better than traditional methods when events are rare, and heterogeneity is present.

APA, Harvard, Vancouver, ISO, and other styles

49

Headley, Miguel Learie. "Assessing the reliability, resilience and sustainability of water resources systems in data-rich and data-sparse regions." Thesis, University of Exeter, 2018. http://hdl.handle.net/10871/33192.

Full text

Abstract:

Uncertainty associated with the potential impact of climate change on supply availability, varied success with demand-side interventions such as water efficiency and changes in priority relating to hydrometric data collection and ownership, have resulted in challenges for water resources system management particularly in data-sparse regions. Consequently, the aim of this thesis is to assess the reliability, resilience and sustainability of water resources systems in both data-rich and data-sparse regions with an emphasis on robust decision-making in data-sparse regions. To achieve this aim, new resilience indicators that capture water resources system failure duration and extent of failure (i.e. failure magnitude) from a social and environmental perspective were developed. These performance indicators enabled a comprehensive assessment of a number of performance enhancing interventions, which resulted in the identification of a set of intervention strategies that showed potential to improve reliability, resilience and sustainability in the case studies examined. Finally, a multi-criteria decision analysis supported trade-off decision making when the reliability, resilience and sustainability indicators were considered in combination. Two case studies were considered in this research: Kingston and St. Andrew in Jamaica and Anyplace in the UK. The Kingston and St. Andrew case study represents the main data-sparse case study where many assumptions were introduced to fill data gaps. The intervention strategy that showed great potential to improve reliability, resilience and sustainability identified from Kingston and St. Andrew water resources assessment was the ‘Site A-east’ desalination scheme. To ameliorate uncertainty and lack of confidence associated with results, a methodology was developed that transformed a key proportion of the Anyplace water resources system from a data-rich environment to a data-sparse environment. The Anyplace water resources system was then assessed in a data-sparse environment and the performance trade-offs of the intervention strategies were analysed using four multi-criteria decision analysis (MCDA) weighting combinations. The MCDA facilitated a robust comparison of the interventions’ performances in the data-rich and data-sparse case studies. Comparisons showed consistency in the performances of the interventions across data-rich and data-sparse hydrological conditions and serve to demonstrate to decision makers a novel approach to addressing uncertainty when many assumptions have been introduced in the water resources management process due to data sparsity.

APA, Harvard, Vancouver, ISO, and other styles

50

Kearney, James Rhys. "Sparse data inference for point process failure models incorporating multiple maintenance effects." Thesis, University of Salford, 2011. http://usir.salford.ac.uk/26751/.

Full text

Abstract:

The primary scenario within repairable system reliability estimation investigated is that of a single failure mode the likelihood of which occurring is supposed to be affected by maintenance activities of varying effect and degree. Since the structural composition of the systems considered are unknown, the models developed are simplifications premised either on a mechanistic conception of a maintenance action (the Proportional Renewal Model) or by empirically representing the effect of the maintenance action by the transference of a subset of system components from being in an unmaintained state to a maintained state with the reverse process determined by some decay process (Maintenance Decay Model). Maintenance actions are classified either as 'corrective' (CM) if undertaken in response to failure or as 'preventive' (PM) if elective. The datasets analysed in this work - collected in the petrochemical industry over a number of years - are typically sparse and contain observations of a number PM types. The interactions of different maintenance types on a single failure mode (one type of CM) are investigated and related to the problem of maintenance scheduling optimisation. Given the complexity of the models and the sparse nature of reliability data, statistical methods to assess the level of confidence in the model parameter required to incorporate diverse maintenance effects are compared with particular focus given to Bayesian methods of statistical inference which have the advantage of being able to incorporate the use of prior knowledge in the estimation procedure.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Sparse data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles