Dissertations / Theses: 'Explorative multivariate data analysis'

1

Bergfors, Linus. "Explorative Multivariate Data Analysis of the Klinthagen Limestone Quarry Data." Thesis, Uppsala University, Department of Information Technology, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-122575.

Full text

Abstract:

The today quarry planning at Klinthagen is rough, which provides an opportunity to introduce new exciting methods to improve the quarry gain and efficiency. Nordkalk AB, active at Klinthagen, wishes to start a new quarry at a nearby location. To exploit future quarries in an efficient manner and ensure production quality, multivariate statistics may help gather important information.

In this thesis the possibilities of the multivariate statistical approaches of Principal Component Analysis (PCA) and Partial Least Squares (PLS) regression were evaluated on the Klinthagen bore data. PCA data were spatially interpolated by Kriging, which also was evaluated and compared to IDW interpolation.

Principal component analysis supplied an overview of the variables relations, but also visualised the problems involved when linking geophysical data to geochemical data and the inaccuracy introduced by lacking data quality.

The PLS regression further emphasised the geochemical-geophysical problems, but also showed good precision when applied to strictly geochemical data.

Spatial interpolation by Kriging did not result in significantly better approximations than the less complex control interpolation by IDW.

In order to improve the information content of the data when modelled by PCA, a more discrete sampling method would be advisable. The data quality may cause trouble, though with sample technique of today it was considered to be of less consequence.

Faced with a single geophysical component to be predicted from chemical variables further geophysical data need to complement existing data to achieve satisfying PLS models.

The stratified rock composure caused trouble when spatially interpolated. Further investigations should be performed to develop more suitable interpolation techniques.

APA, Harvard, Vancouver, ISO, and other styles

2

Yang, Di. "Analysis guided visual exploration of multivariate data." Worcester, Mass. : Worcester Polytechnic Institute, 2007. http://www.wpi.edu/Pubs/ETD/Available/etd-050407-005925/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Engel, Daniel [Verfasser], Hans [Akademischer Betreuer] Hagen, and Bernd [Akademischer Betreuer] Hamann. "Explorative and Model-based Visual Analysis of Multivariate Data / Daniel Engel. Betreuer: Hans Hagen ; Bernd Hamann." Kaiserslautern : Technische Universität Kaiserslautern, 2014. http://d-nb.info/1054636176/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Doshi, Punit Rameshchandra. "Adaptive prefetching for visual data exploration." Link to electronic thesis, 2003. http://www.wpi.edu/Pubs/ETD/Available/etd-0131103-203307.

Full text

Abstract:

Thesis (M.S.)--Worcester Polytechnic Institute.
Keywords: Adaptive prefetching; Large-scale multivariate data visualization; Semantic caching; Hierarchical data exploration; Exploratory data analysis. Includes bibliographical references (p.66-70).

APA, Harvard, Vancouver, ISO, and other styles

5

Lu, Kewei. "Distribution-based Exploration and Visualization of Large-scale Vector and Multivariate Fields." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1483545901567695.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Vargas, Aurea Rossy Soriano. "Visual exploration to support the identification of relevant attributes in time-varying multivariate data." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-23102018-115029/.

Full text

Abstract:

Ionospheric scintillation is a rapid variation in the amplitude and/or phase of radio signals traveling through the ionosphere. This spatial and time-varying phenomenon is of interest because its occurrence may affect the reception quality of satellite signals. Specialized receivers at strategic regions can track multiple variables related to the phenomenon, generating a database of historical observations on the regional behavior of ionospheric scintillation. The analysis of such data is very challenging, since it consists of time-varying measurements of many variables which are heterogeneous in nature and with possibly many missing values, recorded over extensive time periods. There is a need to introduce alternative intuitive strategies that contribute to experts acquiring further knowledge from the ionospheric scintillation data. Such challenges motivated a study on the applicability of visualization techniques to support tasks of identification of relevant attributes in the study of the behavior of phenomena described by multiple time-varying variables, of which the ionospheric scintillation is a good example. In particular, this thesis introduces a visual analytics framework, named TV-MV Analytics, that supports exploratory tasks on time-varying multivariate data and was developed following the requirements of experts on ionospheric scintillation from the Faculty of Science and Technology of UNESP at Presidente Prudente, Brazil. TV-MV Analytics provides an interactive visual exploration loop to analysts inspecting the behavior of multiple variables at different temporal scales, through temporal representations associated with clustering and multidimensional projection techniques. Analysts can also assess how different feature sub-spaces contribute to characterizing a certain behavior, where they may direct the analysis process and include their domain knowledge in the exploratory analysis. We also illustrate the application of TV-MV Analytics on multivariate time-varying data sets from three alternative application domains. Experimental results indicate the proposed solutions show good potential on assisting time-varying multivariate data mining tasks, since it reduces the effort required from experts to gain deeper insight into the historical behavior of the variables describing a phenomenon or domain.
A cintilação ionosférica é uma variação rápida na amplitude e/ou na fase dos sinais de rádio que viajam através da ionosfera. Este fenômeno espacial e variante no tempo é de grande interesse, pois pode afetar a qualidade de recepção dos sinais de satélite. Receptores especializados em regiões estratégicas podem rastrear múltiplas variáveis relacionadas ao fenômeno, gerando um banco de dados de observações históricas sobre o comportamento regional da cintilação. O estudo do comportamento da cintilação é desafiador, uma vez que requer a análise extensiva de dados multivariados e variantes no tempo, coletados por longos períodos. Medições são registradas continuamente, e são de natureza heterogênea, compreendendo múltiplas variáveis de diferentes categorias e possivelmente com muitos valores faltantes. Portanto, existe a necessidade de introduzir estratégias alternativas, eficientes e intuitivas, que contribuam para a adquisição de conhecimento, a partir dos dados, por especialistas que estudam a cintilação ionosférica. Tais desafios motivaram o estudo da aplicabilidade de técnicas de visualização para apoiar tarefas de identificação de atributos relevantes no estudo do comportamento de fenômenos ou domínios que envolvem múltiplas variáveis, como a cintilação. Em particular, esta tese introduz um arcabouço visual, o qual foi denominado TV-MV Analytics, que apoia tarefas de análise exploratória sobre dados multivariados e variáveis no tempo, inspirado em requisitos de especialistas no estudo da cintilação, vinculados à Faculdade de Ciências e Tecnologia da UNESP de Presidente Prudente, Brasil. O TV-MV Analytics fornece aos analistas um ciclo de interativo de exploração que apoia a inspeção do comportamento temporal de múltiplas variáveis, em diferentes escalas temporais, por meio de representações visuais temporais associadas a técnicas de agrupamento e de projeção multidimensional. Também permite avaliar como diferentes sub-espaços de atributos caracterizam um determinado comportamento, podendo direcionar o processo de análise e inserir seu conhecimento do domínio no processo de análise exploratória. As funcionalidades do TV-MV Analytics também são ilustradas em dados variantes no tempo oriundos de outros três domínios de aplicação. Os resultados experimentais indicaram que as soluções propostas têm bom potencial em tarefas de mineração de dados multivariados e variantes no tempo, uma vez que reduz o esforço e contribui para os especialistas obterem informações detalhadas sobre o comportamento histórico das variáveis que descrevem um determinado fenômeno ou domínio.

APA, Harvard, Vancouver, ISO, and other styles

7

Rammelkamp, Kristin. "Investigation of LIBS and Raman data analysis methods in the context of in-situ planetary exploration." Doctoral thesis, Humboldt-Universität zu Berlin, 2019. http://dx.doi.org/10.18452/20703.

Full text

Abstract:

Die in dieser Arbeit vorgestellten Studien untersuchen verschiedene Ansätze für die Analyse von spektroskopischen Daten für die Erforschung anderer Himmelskörper. Der Fokus lag hierbei auf der laserinduzierten Plasmaspektroskopie (LIBS, engl. laser-induced breakdown spectroscopy), aber auch Daten der Raman-Spektroskopie wurden analysiert. Das erste extraterrestrisch eingesetzte LIBS Instrument, ChemCam, auf dem Mars Science Laboratory (MSL) der NASA untersucht die Marsoberfläche seit 2012 und weitere Missionen mit LIBS und Raman Instrumenten zum Mars sind geplant. Neben analytischen Ansätzen wurden statistische Methoden, die als multivariate Datenanalysen (MVA) bekannt sind, verwendet und evaluiert. In dieser Arbeit werden insgesamt vier Studien vorgestellt. In der ersten Studie wurde die Normalisierung von LIBS Daten mit Plasmaparametern, also der Plasmatemperatur und der Elektronendichte, untersucht. In der zweiten Studie wurden LIBS Messungen unter Vakuumbedingungen im Hinblick auf den Ionisierungsgrad des Plasmas untersucht. In der dritten Studie wurden MVA Methoden wie die Hauptkomponentenanalyse (PCA) und die partielle Regression kleinster Quadrate (PLS-R) zur Identifizierung und Quantifizierung von Halogenen mittels molekularer Emissionen angewandt. Die Ergebnisse sind vielversprechend, da es möglich war z.B. Chlor in einem ausgewählten Konzentrationsbereich zu quantifizieren. In der letzten Studie wurden LIBS-Daten mit komplementären Raman-Daten von Mars relevanten Salzen in einem low-level Datenfusionsansatz kombiniert. Es wurden MVA Methoden angewandt und auch Konzepte der high-level Datenfusion implementiert. Mit der low-level LIBS und Raman Datenfusion konnten im Vergleich zu den einzelnen Techniken mehr Salze richtig identifiziert werden. Der Gewinn durch die low-level Datenfusion ist jedoch vergleichsweise gering und für konkrete Missionen müssen individuelle und angepasste Strategien für die gemeinsame Analyse von LIBS und Raman-Daten gefunden werden.
The studies presented in this thesis investigate different data analysis approaches for mainly laser-induced breakdown spectroscopy (LIBS) and also Raman data in the context of planetary in-situ exploration. Most studies were motivated by Mars exploration due to the first extraterrestrially employed LIBS instrument ChemCam on NASA's Mars Science Laboratory (MSL) and further planned LIBS and Raman instruments on upcoming missions to Mars. Next to analytical approaches, statistical methods known as multivariate data analysis (MVA) were applied and evaluated. In this thesis, four studies are presented in which LIBS and Raman data analysis strategies are evaluated. In the first study, LIBS data normalization with plasma parameters, namely the plasma temperature and the electron density, was studied. In the second study, LIBS measurements in vacuum conditions were investigated with a focus on the degree of ionization of the LIBS plasma. In the third study, the capability of MVA methods such as principal component analysis (PCA) and partial least squares regression (PLS-R) for the identification and quantification of halogens by means of molecular emissions was tested. The outcomes are promising, as it was possible to distinguish apatites and to quantify chlorine in a particular concentration range. In the fourth and last study, LIBS data was combined with complementary Raman data in a low-level data fusion approach using MVA methods. Also, concepts of high-level data fusion were implemented. Low-level LIBS and Raman data fusion can improve identification capabilities in comparison to the single datasets. However, the improvement is comparatively small regarding the higher amount of information in the low-level fused data and dedicated strategies for the joint analysis of LIBS and Raman data have to be found for particular scientific objectives.

APA, Harvard, Vancouver, ISO, and other styles

8

Ablin, Pierre. "Exploration of multivariate EEG /MEG signals using non-stationary models." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT051.

Full text

Abstract:

L'Analyse en Composantes Indépendantes (ACI) modèle un ensemble de signaux comme une combinaison linéaire de sources indépendantes. Cette méthode joue un rôle clé dans le traitement des signaux de magnétoencéphalographie (MEG) et électroencéphalographie (EEG). L'ACI de tels signaux permet d'isoler des sources de cerveau intéressantes, de les localiser, et de les séparer d'artefacts. L'ACI fait partie de la boite à outils de nombreux neuroscientifiques, et est utilisée dans de nombreux articles de recherche en neurosciences. Cependant, les algorithmes d'ACI les plus utilisés ont été développés dans les années 90. Ils sont souvent lents lorsqu'ils sont appliqués sur des données réelles, et sont limités au modèle d'ACI classique.L'objectif de cette thèse est de développer des algorithmes d'ACI utiles en pratique aux neuroscientifiques. Nous suivons deux axes. Le premier est celui de la vitesse : nous considérons le problème d'optimisation résolu par deux des algorithmes les plus utilisés par les praticiens: Infomax et FastICA. Nous développons une nouvelle technique se basant sur un préconditionnement par des approximations de la Hessienne de l'algorithm L-BFGS. L'algorithme qui en résulte, Picard, est conçu pour être appliqué sur données réelles, où l'hypothèse d’indépendance n'est jamais entièrement vraie. Sur des données de M/EEG, il converge plus vite que les implémentations `historiques'.Les méthodes incrémentales, qui traitent quelques échantillons à la fois au lieu du jeu de données complet, constituent une autre possibilité d’accélération de l'ACI. Ces méthodes connaissent une popularité grandissante grâce à leur faculté à bien passer à l'échelle sur de grands jeux de données. Nous proposons un algorithme incrémental pour l'ACI, qui possède une importante propriété de descente garantie. En conséquence, cet algorithme est simple d'utilisation, et n'a pas de paramètre critique et difficile à régler comme un taux d'apprentissage.En suivant un second axe, nous proposons de prendre en compte du bruit dans le modèle d'ACI. Le modèle resultant est notoirement difficile et long à estimer sous l'hypothèse standard de non-Gaussianité de l'ACI. Nous nous reposons donc sur une hypothèse de diversité spectrale, qui mène à un algorithme facile d'utilisation et utilisable en pratique, SMICA. La modélisation du bruit permet de nouvelles possibilités inenvisageables avec un modèle d'ACI classique, comme une estimation fine des source et l'utilisation de l'ACI comme une technique de réduction de dimension statistiquement bien posée. De nombreuses expériences sur données M/EEG démontrent l'utilité de cette nouvelle approche.Tous les algorithmes développés dans cette thèse sont disponibles en accès libre sur internet. L’algorithme Picard est inclus dans les librairies de traitement de données M/EEG les plus populaires en Python (MNE) et en Matlab (EEGlab)
Independent Component Analysis (ICA) models a set of signals as linear combinations of independent sources. This analysis method plays a key role in electroencephalography (EEG) and magnetoencephalography (MEG) signal processing. Applied on such signals, it allows to isolate interesting brain sources, locate them, and separate them from artifacts. ICA belongs to the toolbox of many neuroscientists, and is a part of the processing pipeline of many research articles. Yet, the most widely used algorithms date back to the 90's. They are often quite slow, and stick to the standard ICA model, without more advanced features.The goal of this thesis is to develop practical ICA algorithms to help neuroscientists. We follow two axes. The first one is that of speed. We consider the optimization problems solved by two of the most widely used ICA algorithms by practitioners: Infomax and FastICA. We develop a novel technique based on preconditioning the L-BFGS algorithm with Hessian approximation. The resulting algorithm, Picard, is tailored for real data applications, where the independence assumption is never entirely true. On M/EEG data, it converges faster than the `historical' implementations.Another possibility to accelerate ICA is to use incremental methods, which process a few samples at a time instead of the whole dataset. Such methods have gained huge interest in the last years due to their ability to scale well to very large datasets. We propose an incremental algorithm for ICA, with important descent guarantees. As a consequence, the proposed algorithm is simple to use and does not have a critical and hard to tune parameter like a learning rate.In a second axis, we propose to incorporate noise in the ICA model. Such a model is notoriously hard to fit under the standard non-Gaussian hypothesis of ICA, and would render estimation extremely long. Instead, we rely on a spectral diversity assumption, which leads to a practical algorithm, SMICA. The noise model opens the door to new possibilities, like finer estimation of the sources, and use of ICA as a statistically sound dimension reduction technique. Thorough experiments on M/EEG datasets demonstrate the usefulness of this approach.All algorithms developed in this thesis are open-sourced and available online. The Picard algorithm is included in the largest M/EEG processing Python library, MNE and Matlab library, EEGlab

APA, Harvard, Vancouver, ISO, and other styles

9

Oliveira, Irene. "Correlated data in multivariate analysis." Thesis, University of Aberdeen, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.401414.

Full text

Abstract:

After presenting (PCA) Principal Component Analysis and its relationship with time series data sets, we describe most of the existing techniques in this field. Various techniques, e.g. Singular Spectrum Analysis, Hilbert EOF, Extended EOF or Multichannel Singular Spectrum Analysis (MSSA), Principal Oscillation Pattern Analysis (POP Analysis), can be used for such data. The way we use the matrix of data or the covariance or correlation matrix, makes each method different from the others. SSA may be considered as a PCA performed on a lagged versions of a single time series where we may decompose the original time series into some main components. Following SSA we have its multivariate version (MSSA) where we try to augment the initial matrix of data to get information on lagged versions of each variable (time series) and so past (or future) behaviour can be used to reanalyse the information between variables. In POP Analysis a linear system involving the vector field is analysed, x_t+1=Ax_t+n_t, in order to “know” x_t at time t+1 given the information from time t. The matrix A is estimated by using not only the covariance matrix but also the matrix of covariances between the systems at the current time and at lag 1. In Hilbert EOF we try to get some (future) information from the internal correlation in each variable by using the Hilbert transform of each series in a augmented complex matrix with the data themselves in the real part and the Hilbert time series in the imaginary part X_t + X_t^H. In addition to all these ideas from the statistics and other literature we develop a new methodology as a modification of HEOF and POP Analysis, namely Hilbert Oscillation Patterns (HOP) Analysis or the related idea of Hilbert Canonical Correlation Analysis (HCCA), by using a system, _x^H_t = Ax_t + n_t. Theory and assumptions are presented and HOPS results will be related with the results extracted from a Canonical Correlation Analysis between the time series data matrix and its Hilbert transform. Some examples will be given to show the differences and similarities of the results of the HCCA technique with those from PCA, MSSA, HEOF and POPs. We also present PCA for time series as observations where a technique of linear algebra (PCA) becomes a problem in function analysis leading to Functional PCA (FPCA). We also adapt PCA to allow for this and discuss the theoretical and practical behaviour of using PCA on the even part (EPCA) and odd part (OPCA) of the data, and its application in functional data. Comparisons will be made between PCA and this modification, for the reconstruction of data sets for which considerations of symmetry are especially relevant.

APA, Harvard, Vancouver, ISO, and other styles

10

Prelorendjos, Alexios. "Multivariate analysis of metabonomic data." Thesis, University of Strathclyde, 2014. http://oleg.lib.strath.ac.uk:80/R/?func=dbin-jump-full&object_id=24286.

Full text

Abstract:

Metabonomics is one of the main technologies used in biomedical sciences to improve understanding of how various biological processes of living organisms work. It is considered a more advanced technology than e.g. genomics and proteomics, as it can provide important evidence of molecular biomarkers for the diagnosis of diseases and the evaluation of beneficial adverse drug effects, by studying the metabolic profiles of living organisms. This is achievable by studying samples of various types such as tissues and biofluids. The findings of a metabonomics study for a specific disease, disorder or drug effect, could be applied to other diseases, disorders or drugs, making metabonomics an important tool for biomedical research. This thesis aims to review and study various multivariate statistical techniques which can be used in the exploratory analysis of metabonomics data. To motivate this research, a metabonomics data set containing the metabolic profiles of a group of patients with epilepsy was used. More specifically, the metabolic fingerprints (proton NMR spectra) of 125 patients with epilepsy, of blood serum type, have been obtained from the Western Infirmary, Glasgow, for the purposes of this project. These data were originally collected as baseline data in a study to investigate if the treatment with Anti-Epileptic Drugs (AEDs), of patients with pharmacoresistant epilepsy affects the seizure levels of the patients. The response to the drug treatment in terms of the reduction in seizure levels of these patients enabled two main categories of response to be identified, i.e. responders and the non-responders to AEDs. We explore the use of statistical methods used in metabonomics to analyse these data. Novel aspects of the thesis are the use of Self Organising Maps (SOM) and of Fuzzy Clustering Methods to pattern recognition in metabonomics data. Part I of the thesis defines metabonomics and the other main "omics" technologies, and gives a detailed description of the metabonomics data to be analysed, as well as a description of the two main analytical chemical techniques, Mass Spectrometry (MS) and Nuclear Magnetic Resonance Spectroscopy (NMR), that can be used to generate metabonomics data. Pre-processing and pre-treatment methods that are commonly used in NMR-generated metabonomics data to enhance the quality and accuracy of the data, are also discussed. In Part II, several unsupervised statistical techniques are reviewed and applied to the epilepsy data to investigate the capability of these techniques to discriminate the patients according to their type of response. The techniques reviewed include Principal Components Analysis (PCA), Multi-dimensional scaling (both Classical scaling and Sammon's non-linear mapping) and Clustering techniques. The latter include Hierarchical clustering (with emphasis on Agglomerative Nesting algorithms), Partitioning methods (Fuzzy and Hard clustering algorithms) and Competitive Learning algorithms (Self Organizing maps). The advantages and disadvantages of the different methods are examined, for this kind of data. Results of the exploratory multivariate analyses showed that no natural clusters of patients existed with regards to th eir response to AEDs, therefore none of these techniques was capable of discriminating these patients according to their clinical characteristics. To examine the capability of an unsupervised technique such as PCA, to identify groups in such data as the data based on metabolic fingerprints of patients with epilepsy, a simulation algorithm was developed to run a series of experiments, covered in Part III of the thesis. The aim of the simulation study is to investigate the extent of the difference in the clusters of the data, and under what conditions this difference is detectable by unsupervised techniques. Furthermore, the study examines whether the existence or lack of variation in the mean-shifted variables affects the discriminating ability of the unsupervised techniques (in this case PCA) or not. In each simulation experiment, a reference and a test data set were generated based on the original epilepsy data, and the discriminating capability of PCA was assessed. A test set was generated by mean-shifting a pre-selected number of variables in a reference set. Three methods of selecting the variables to meanshift (maximum and minimum standard deviations and maximum means), five subsets of variables of sizes 1, 3, 20, 120 and 244 (total number of variables in the data sets) and three sample sizes (100, 500 and 1000) were used. Average values in 100 runs of an experiment for two statistics, i.e. the misclassification rate and the average separation (Webb, 2002) were recorded. Results showed that the number of mean-shifted variables (in general) and the methods used to select the variables (in some cases) are important factors for the discriminating ability of PCA, whereas the sample size of the two data sets does not play any role in the experiments (although experiments in large sample sizes showed greater stability in the results for the two statistics in 100 runs of any experiment). The results have implications for the use of PCA with metabonomics data generally.

APA, Harvard, Vancouver, ISO, and other styles

11

Tavares, Nuno Filipe Ramalho da Cunha. "Multivariate analysis applied to clinical analysis data." Master's thesis, Faculdade de Ciências e Tecnologia, 2014. http://hdl.handle.net/10362/12288.

Full text

Abstract:

Dissertação para obtenção do Grau de Mestre em Engenharia e Gestão Industrial
Folate, vitamin B12, iron and hemoglobin are essential for metabolic functions in the body. The deficiency of these can be the cause of several known pathologies and, untreated, can be responsible for severe morbidity and even death. The objective of this study is to characterize a population, residing in the metropolitan area of Lisbon and Setubal, concerning serum levels of folate, vitamin B12, iron and hemoglobin, as well as finding evidence of correlations between these parameters and illnesses, mainly cardiovascular, gastrointestinal, neurological and anemia. Clinical analysis data was collected and submitted to multivariate analysis. First the data was screened with Spearman correlation and Kruskal-Wallis analysis of variance to study correlations and variability between groups. To characterize the population, we used cluster analysis with Ward’s linkage method. Finally a sensitivity analysis was performed to strengthen the results. A positive correlation between iron with, ferritin and transferrin, and with hemoglobin was observed with the Spearman correlation. Kruskal-Wallis analysis of variance test showed significant differences between these biomarkers in persons aged 0 to 29, 30 to 59 and over 60 years old. Cluster analysis proved to be a useful tool when characterizing a population based on its biomarkers, showing evidence of low folate levels for the population in general, and hemoglobin levels below the reference values. Iron and vitamin B12 were within the reference range for most of the population. Low levels of the parameters were registered mainly in patients with cardiovascular, gastrointestinal, and neurological diseases and anemia.

APA, Harvard, Vancouver, ISO, and other styles

12

Rehman, Naveed Ur. "Data-driven time-frequency analysis of multivariate data." Thesis, Imperial College London, 2011. http://hdl.handle.net/10044/1/9116.

Full text

Abstract:

Empirical Mode Decomposition (EMD) is a data-driven method for the decomposition and time-frequency analysis of real world nonstationary signals. Its main advantages over other time-frequency methods are its locality, data-driven nature, multiresolution-based decomposition, higher time-frequency resolution and its ability to capture oscillation of any type (nonharmonic signals). These properties have made EMD a viable tool for real world nonstationary data analysis. Recent advances in sensor and data acquisition technologies have brought to light new classes of signals containing typically several data channels. Currently, such signals are almost invariably processed channel-wise, which is suboptimal. It is, therefore, imperative to design multivariate extensions of the existing nonlinear and nonstationary analysis algorithms as they are expected to give more insight into the dynamics and the interdependence between multiple channels of such signals. To this end, this thesis presents multivariate extensions of the empirical mode de- composition algorithm and illustrates their advantages with regards to multivariate non- stationary data analysis. Some important properties of such extensions are also explored, including their ability to exhibit wavelet-like dyadic filter bank structures for white Gaussian noise (WGN), and their capacity to align similar oscillatory modes from multiple data channels. Owing to the generality of the proposed methods, an improved multi- variate EMD-based algorithm is introduced which solves some inherent problems in the original EMD algorithm. Finally, to demonstrate the potential of the proposed methods, simulations on the fusion of multiple real world signals (wind, images and inertial body motion data) support the analysis.

APA, Harvard, Vancouver, ISO, and other styles

13

Droop, Alastair Philip. "Correlation Analysis of Multivariate Biological Data." Thesis, University of York, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.507622.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Collins, Gary Stephen. "Multivariate analysis of flow cytometry data." Thesis, University of Exeter, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.324749.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Haydock, Richard. "Multivariate analysis of Raman spectroscopy data." Thesis, University of Nottingham, 2015. http://eprints.nottingham.ac.uk/30697/.

Full text

Abstract:

This thesis is concerned with developing techniques for analysing Raman spectroscopic images. A Raman spectroscopic image differs from a standard image as in place of red, green and blue quantities for each pixel a Raman image contains a spectrum of light intensities at each pixel. These spectra are used to identify the chemical components from which the image subject, for example a tablet, is comprised. The study of these types of images is known as chemometrics, with the majority of chemometric methods based on multivariate statistical and image analysis techniques. The work in this thesis has two main foci. The first of these is on the spectral decomposition of a Raman image, the purpose of which is to identify the component chemicals and their concentrations. The standard method for this is to fit a bilinear model to the image where both parts of the model, representing components and concentrations, must be estimated. As the standard bilinear model is nonidentifiable in its solutions we investigate the range of possible solutions in the solution space with a random walk. We also derive an improved model for spectral decomposition, combining cluster analysis techniques and the standard bilinear model. For this purpose we apply the expectation maximisation algorithm on a Gaussian mixture model with bilinear means, to represent our spectra and concentrations. This reduces noise in the estimated chemical components by separating the Raman image subject from the background. The second focus of this thesis is on the analysis of our spectral decomposition results. For testing the chemical components for uniform mixing we derive test statistics for identifying patterns in the image based on Minkowski measures, grey level co-occurence matrices and neighbouring pixel correlations. However with a non-identifiable model any hypothesis tests performed on the solutions will be specific to only that solution. Therefore to obtain conclusions for a range of solutions we combined our test statistics with our random walk. We also investigate the analysis of a time series of Raman images as the subject dissolved. Using models comprised of Gaussian cumulative distribution functions we are able to estimate the changes in concentration levels of dissolving tablets between the scan times. The results of which allowed us to describe the dissolution process in terms of the quantities of component chemicals.

APA, Harvard, Vancouver, ISO, and other styles

16

Zhu, Liang. "Semiparametric analysis of multivariate longitudinal data." Diss., Columbia, Mo. : University of Missouri-Columbia, 2008. http://hdl.handle.net/10355/6044.

Full text

Abstract:

Thesis (Ph. D.)--University of Missouri-Columbia, 2008.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on August 3, 2009) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

17

Schmutz, Amandine. "Contributions à l'analyse de données fonctionnelles multivariées, application à l'étude de la locomotion du cheval de sport." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1241.

Full text

Abstract:

Avec l'essor des objets connectés pour fournir un suivi systématique, objectif et fiable aux sportifs et à leur entraineur, de plus en plus de paramètres sont collectés pour un même individu. Une alternative aux méthodes d'évaluation en laboratoire est l'utilisation de capteurs inertiels qui permettent de suivre la performance sans l'entraver, sans limite d'espace et sans procédure d'initialisation fastidieuse. Les données collectées par ces capteurs peuvent être vues comme des données fonctionnelles multivariées : se sont des entités quantitatives évoluant au cours du temps de façon simultanée pour un même individu statistique. Cette thèse a pour objectif de chercher des paramètres d'analyse de la locomotion du cheval athlète à l'aide d'un capteur positionné dans la selle. Cet objet connecté (centrale inertielle, IMU) pour le secteur équestre permet de collecter l'accélération et la vitesse angulaire au cours du temps, dans les trois directions de l'espace et selon une fréquence d'échantillonnage de 100 Hz. Une base de données a ainsi été constituée rassemblant 3221 foulées de galop, collectées en ligne droite et en courbe et issues de 58 chevaux de sauts d'obstacles de niveaux et d'âges variés. Nous avons restreint notre travail à la prédiction de trois paramètres : la vitesse par foulée, la longueur de foulée et la qualité de saut. Pour répondre aux deux premiers objectifs nous avons développé une méthode de clustering fonctionnelle multivariée permettant de diviser notre base de données en sous-groupes plus homogènes du point de vue des signaux collectés. Cette méthode permet de caractériser chaque groupe par son profil moyen, facilitant leur compréhension et leur interprétation. Mais, contre toute attente, ce modèle de clustering n'a pas permis d'améliorer les résultats de prédiction de vitesse, les SVM restant le modèle ayant le pourcentage d'erreur inférieur à 0.6 m/s le plus faible. Il en est de même pour la longueur de foulée où une précision de 20 cm est atteinte grâce aux Support Vector Machine (SVM). Ces résultats peuvent s'expliquer par le fait que notre base de données est composée uniquement de 58 chevaux, ce qui est un nombre d'individus très faible pour du clustering. Nous avons ensuite étendu cette méthode au co-clustering de courbes fonctionnelles multivariées afin de faciliter la fouille des données collectées pour un même cheval au cours du temps. Cette méthode pourrait permettre de détecter et prévenir d'éventuels troubles locomoteurs, principale source d'arrêt du cheval de saut d'obstacle. Pour finir, nous avons investigué les liens entre qualité du saut et les signaux collectés par l'IMU. Nos premiers résultats montrent que les signaux collectés par la selle seuls ne suffisent pas à différencier finement la qualité du saut d'obstacle. Un apport d'information supplémentaire sera nécessaire, à l'aide d'autres capteurs complémentaires par exemple ou encore en étoffant la base de données de façon à avoir un panel de chevaux et de profils de sauts plus variés
With the growth of smart devices market to provide athletes and trainers a systematic, objective and reliable follow-up, more and more parameters are monitored for a same individual. An alternative to laboratory evaluation methods is the use of inertial sensors which allow following the performance without hindering it, without space limits and without tedious initialization procedures. Data collected by those sensors can be classified as multivariate functional data: some quantitative entities evolving along time and collected simultaneously for a same individual. The aim of this thesis is to find parameters for analysing the athlete horse locomotion thanks to a sensor put in the saddle. This connected device (inertial sensor, IMU) for equestrian sports allows the collection of acceleration and angular velocity along time in the three space directions and with a sampling frequency of 100 Hz. The database used for model development is made of 3221 canter strides from 58 ridden jumping horses of different age and level of competition. Two different protocols are used to collect data: one for straight path and one for curved path. We restricted our work to the prediction of three parameters: the speed per stride, the stride length and the jump quality. To meet the first to objectives, we developed a multivariate functional clustering method that allow the division of the database into smaller more homogeneous sub-groups from the collected signals point of view. This method allows the characterization of each group by it average profile, which ease the data understanding and interpretation. But surprisingly, this clustering model did not improve the results of speed prediction, Support Vector Machine (SVM) is the model with the lowest percentage of error above 0.6 m/s. The same applied for the stride length where an accuracy of 20 cm is reached thanks to SVM model. Those results can be explained by the fact that our database is build from 58 horses only, which is a quite low number of individuals for a clustering method. Then we extend this method to the co-clustering of multivariate functional data in order to ease the datamining of horses’ follow-up databases. This method might allow the detection and prevention of locomotor disturbances, main source of interruption of jumping horses. Lastly, we looked for correlation between jumping quality and signals collected by the IMU. First results show that signals collected by the saddle alone are not sufficient to differentiate finely the jumping quality. Additional information will be needed, for example using complementary sensors or by expanding the database to have a more diverse range of horses and jump profiles

APA, Harvard, Vancouver, ISO, and other styles

18

Lans, Ivo A. van der. "Nonlinear multivariate analysis for multiattribute preference data." [Leiden] : DSWO Press, Leiden University, 1992. http://catalog.hathitrust.org/api/volumes/oclc/28733326.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Tardif, Geneviève. "Multivariate Analysis of Canadian Water Quality Data." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32245.

Full text

Abstract:

Physical-chemical water quality data from lotic water monitoring sites across Canada were integrated into one dataset. Two overlapping matrices of data were analyzed with principal component analysis (PCA) and cluster analysis to uncover structure and patterns in the data. The first matrix (Matrix A) had 107 sites located throughout Canada, and the following water quality parameters: pH, specific conductance (SC), and total phosphorus (TP). The second matrix (Matrix B) included more variables: calcium (Ca), chloride (Cl), total alkalinity (T_ALK), dissolved oxygen (DO), water temperature (WT), pH, SC and TP; for a subset of 42 sites. Landscape characteristics were calculated for each water quality monitoring site and their importance in explaining water quality data was examined through redundancy analysis. The first principal components in the analyses of Matrix A and B were most correlated with SC, suggesting this parameter is the most representative of water quality variance at the scale of Canada. Overlaying cluster analysis results on PCA information proved an excellent mean to identify the major water characteristics defining each group; mapping cluster analysis group membership provided information on their spatial distribution and was found informative with regards to the probable environmental influences on each group. Redundancy analyses produced significant predictive models of water quality demonstrating that landscape characteristics are determinant factors in water quality at the country scale. The proportion of cropland and the mean annual total precipitation in the drainage area were the landscape variables with the most variance explained. Assembling a consistent dataset of water quality data from monitoring locations throughout Canada proved difficult due to the unevenness of the monitoring programs in place. It is therefore recommended that a standard for the monitoring of a minimum core set of water quality variable be implemented throughout the country to support future nation-wide analysis of water quality data.

APA, Harvard, Vancouver, ISO, and other styles

20

Snavely, Anna Catherine. "Multivariate Data Analysis with Applications to Cancer." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10371.

Full text

Abstract:

Multivariate data is common in a wide range of settings. As data structures become increasingly complex, additional statistical tools are required to perform proper analyses. In this dissertation we develop and evaluate methods for the analysis of multivariate data generated from cancer trials. In the first chapter we consider the analysis of clustered survival data that can arise from multicenter clinical trials. In particular, we review and compare marginal and conditional models numerically through simulations and discuss model selection techniques. A multicenter clinical trial of children with acute lymphoblastic leukemia is used to illustrate the findings. The second and third chapters both address the setting where multiple outcomes are collected when the outcome of interest cannot be measured directly. A head and neck cancer trial in which multiple outcomes were collected to measure dysphagia was the particular motivation for this part of the dissertation. Specifically, in the second chapter we propose a semiparametric latent variable transformation model that incorporates measurable outcomes of mixed types, including censored outcomes. This method extends traditional approaches by allowing the relationship between the measurable outcomes and latent variable to be unspecified, rendering more robust inference. Using this approach we can directly estimate the treatment (or other covariate) effect on the unobserved latent variable, enhancing interpretation. In the third chapter, the basic model from the second chapter is maintained, but additional parametric assumptions are made. This model still has the advantages of allowing for censored measurable outcomes and being able to estimate a treatment effect on the latent variable, but has the added advantage of good performance in a small data set. Together the methods proposed in the second and third chapters provide a comprehensive approach for the analysis of complex multiple outcomes data.

APA, Harvard, Vancouver, ISO, and other styles

21

Bolton, Richard John. "Multivariate analysis of multiproduct market research data." Thesis, University of Exeter, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.302542.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Durif, Ghislain. "Multivariate analysis of high-throughput sequencing data." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1334/document.

Full text

Abstract:

L'analyse statistique de données de séquençage à haut débit (NGS) pose des questions computationnelles concernant la modélisation et l'inférence, en particulier à cause de la grande dimension des données. Le travail de recherche dans ce manuscrit porte sur des méthodes de réductions de dimension hybrides, basées sur des approches de compression (représentation dans un espace de faible dimension) et de sélection de variables. Des développements sont menés concernant la régression "Partial Least Squares" parcimonieuse (supervisée) et les méthodes de factorisation parcimonieuse de matrices (non supervisée). Dans les deux cas, notre objectif sera la reconstruction et la visualisation des données. Nous présenterons une nouvelle approche de type PLS parcimonieuse, basée sur une pénalité adaptative, pour la régression logistique. Cette approche sera utilisée pour des problèmes de prédiction (devenir de patients ou type cellulaire) à partir de l'expression des gènes. La principale problématique sera de prendre en compte la réponse pour écarter les variables non pertinentes. Nous mettrons en avant le lien entre la construction des algorithmes et la fiabilité des résultats.Dans une seconde partie, motivés par des questions relatives à l'analyse de données "single-cell", nous proposons une approche probabiliste pour la factorisation de matrices de comptage, laquelle prend en compte la sur-dispersion et l'amplification des zéros (caractéristiques des données single-cell). Nous développerons une procédure d'estimation basée sur l'inférence variationnelle. Nous introduirons également une procédure de sélection de variables probabiliste basée sur un modèle "spike-and-slab". L'intérêt de notre méthode pour la reconstruction, la visualisation et le clustering de données sera illustré par des simulations et par des résultats préliminaires concernant une analyse de données "single-cell". Toutes les méthodes proposées sont implémentées dans deux packages R: plsgenomics et CMF
The statistical analysis of Next-Generation Sequencing data raises many computational challenges regarding modeling and inference, especially because of the high dimensionality of genomic data. The research work in this manuscript concerns hybrid dimension reduction methods that rely on both compression (representation of the data into a lower dimensional space) and variable selection. Developments are made concerning: the sparse Partial Least Squares (PLS) regression framework for supervised classification, and the sparse matrix factorization framework for unsupervised exploration. In both situations, our main purpose will be to focus on the reconstruction and visualization of the data. First, we will present a new sparse PLS approach, based on an adaptive sparsity-inducing penalty, that is suitable for logistic regression to predict the label of a discrete outcome. For instance, such a method will be used for prediction (fate of patients or specific type of unidentified single cells) based on gene expression profiles. The main issue in such framework is to account for the response to discard irrelevant variables. We will highlight the direct link between the derivation of the algorithms and the reliability of the results. Then, motivated by questions regarding single-cell data analysis, we propose a flexible model-based approach for the factorization of count matrices, that accounts for over-dispersion as well as zero-inflation (both characteristic of single-cell data), for which we derive an estimation procedure based on variational inference. In this scheme, we consider probabilistic variable selection based on a spike-and-slab model suitable for count data. The interest of our procedure for data reconstruction, visualization and clustering will be illustrated by simulation experiments and by preliminary results on single-cell data analysis. All proposed methods were implemented into two R-packages "plsgenomics" and "CMF" based on high performance computing

APA, Harvard, Vancouver, ISO, and other styles

23

Lee, Yau-wing. "Modelling multivariate survival data using semiparametric models." Click to view the E-thesis via HKUTO, 2000. http://sunzi.lib.hku.hk/hkuto/record/B4257528X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

李友榮 and Yau-wing Lee. "Modelling multivariate survival data using semiparametric models." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B4257528X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Billah, Baki. "The analysis of multivariate incomplete failure time data." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1995. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp04/mq25823.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Rawizza, Mark Alan. "Time-series analysis of multivariate manufacturing data sets." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/10895.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Ritchie, Elspeth Kathryn. "Application of multivariate data analysis in biopharmaceutical production." Thesis, University of Newcastle upon Tyne, 2016. http://hdl.handle.net/10443/3356.

Full text

Abstract:

In 2004, the FDA launched the Process Analytical Technology (PAT) initiative to support product and process development. Even before this, the biologics manufacturing industry was working to implement PAT. While a strong focus of PAT is the implementation of new monitoring technologies, there is also a strong emphasis on the use of multivariate data analysis (MVDA). Effective implementation and integration of MVDA is of particular interest as it can be applied retroactively to historical datasets in addition to current datasets. However translation of academic research into industrial ways of working can be slowed or prevented by many obstacles, from proposed solutions being workable only by the original academic to a need to prove that time invested in developing MVDA models and methodologies will result in positive business impacts (e.g. reduction of costs or man hours). The presented research applied MVDA techniques to datasets from three scales typically encountered during investigations of biologics manufacturing processes: a single product, dataset; a single product, multi-scale dataset; a multi-product, multi-scale, single platform dataset. These datasets were interrogated in multiple approaches and multiple objectives (e.g. indictors/causes of productivity variation, comparison of pH measurement technologies). Individual project outcomes culminated in the creation of a robust statistical toolbox. The toolbox captures an array of MVDA techniques from PCA and PLS to decision trees employing k-NN. These are supported by frameworks and guidance for implementation based on interrogation aims encountered in a contract manufacturing environment. The presented frameworks ranged from extraction of indirectly captured information (Chapter 4) to meta-analytical strategies (Chapter 6). Software-based tools generated during research ranged from translation of high frequency online monitoring data as robust summary statistics with intuitive meaning (Appendix A) to tools enabling potential reduction in confounding underlying variation in dataset structures through the use of alternative progression variables (Chapter 5). Each tool was designed to fit into current and future planned ways of working at the sponsor company. The presented research demonstrates a range of investigation aims and challenges encountered in a contract manufacturing organisation with demonstrated benefits from ease of integration into normal work process flows and savings in time and human resources.

APA, Harvard, Vancouver, ISO, and other styles

28

Lawal, Najib. "Modelling and multivariate data analysis of agricultural systems." Thesis, University of Manchester, 2015. https://www.research.manchester.ac.uk/portal/en/theses/modelling-and-multivariate-data-analysis-of-agricultural-systems(f6b86e69-5cff-4ffb-a696-418662ecd694).html.

Full text

Abstract:

The broader research area investigated during this programme was conceived from a goal to contribute towards solving the challenge of food security in the 21st century through the reduction of crop loss and minimisation of fungicide use. This is aimed to be achieved through the introduction of an empirical approach to agricultural disease monitoring. In line with this, the SYIELD project, initiated by a consortium involving University of Manchester and Syngenta, among others, proposed a novel biosensor design that can electrochemically detect viable airborne pathogens by exploiting the biology of plant-pathogen interaction. This approach offers improvement on the inefficient and largely experimental methods currently used. Within this context, this PhD focused on the adoption of multidisciplinary methods to address three key objectives that are central to the success of the SYIELD project: local spore ingress near canopies, the evaluation of a suitable model that can describe spore transport, and multivariate analysis of the potential monitoring network built from these biosensors. The local transport of spores was first investigated by carrying out a field trial experiment at Rothamsted Research UK in order to investigate spore ingress in OSR canopies, generate reliable data for testing the prototype biosensor, and evaluate a trajectory model. During the experiment, spores were air-sampled and quantified using established manual detection methods. Results showed that the manual methods, such as colourimetric detection are more sensitive than the proposed biosensor, suggesting the proxy measurement mechanism used by the biosensor may not be reliable in live deployments where spores are likely to be contaminated by impurities and other inhibitors of oxalic acid production. Spores quantified using the more reliable quantitative Polymerase Chain Reaction proved informative and provided novel of data of high experimental value. The dispersal of this data was found to fit a power decay law, a finding that is consistent with experiments in other crops. In the second area investigated, a 3D backward Lagrangian Stochastic model was parameterised and evaluated with the field trial data. The bLS model, parameterised with Monin-Obukhov Similarity Theory (MOST) variables showed good agreement with experimental data and compared favourably in terms of performance statistics with a recent application of an LS model in a maize canopy. Results obtained from the model were found to be more accurate above the canopy than below it. This was attributed to a higher error during initialisation of release velocities below the canopy. Overall, the bLS model performed well and demonstrated suitability for adoption in estimating above-canopy spore concentration profiles which can further be used for designing efficient deployment strategies. The final area of focus was the monitoring of a potential biosensor network. A novel framework based on Multivariate Statistical Process Control concepts was proposed and applied to data from a pollution-monitoring network. The main limitation of traditional MSPC in spatial data applications was identified as a lack of spatial awareness by the PCA model when considering correlation breakdowns caused by an incoming erroneous observation. This resulted in misclassification of healthy measurements as erroneous. The proposed Kriging-augmented MSPC approach was able to incorporate this capability and significantly reduce the number of false alarms.

APA, Harvard, Vancouver, ISO, and other styles

29

Hopkins, Julie Anne. "Sampling designs for exploratory multivariate analysis." Thesis, University of Sheffield, 2000. http://etheses.whiterose.ac.uk/14798/.

Full text

Abstract:

This thesis is concerned with problems of variable selection, influence of sample size and related issues in the applications of various techniques of exploratory multivariate analysis (in particular, correspondence analysis, biplots and canonical correspondence analysis) to archaeology and ecology. Data sets (both published and new) are used to illustrate these methods and to highlight the problems that arise - these practical examples are returned to throughout as the various issues are discussed. Much of the motivation for the development of the methodology has been driven by the needs of the archaeologists providing the data, who were consulted extensively during the study. The first (introductory) chapter includes a detailed description of the data sets examined and the archaeological background to their collection. Chapters Two, Three and Four explain in detail the mathematical theory behind the three techniques. Their uses are illustrated on the various examples of interest, raising data-driven questions which become the focus of the later chapters. The main objectives are to investigate the influence of various design quantities on the inferences made from such multivariate techniques. Quantities such as the sample size (e.g. number of artefacts collected), the number of categories of classification (e.g. of sites, wares, contexts) and the number of variables measured compete for fixed resources in archaeological and ecological applications. Methods of variable selection and the assessment of the stability of the results are further issues of interest and are investigated using bootstrapping and procrustes analysis. Jack-knife methods are used to detect influential sites, wares, contexts, species and artefacts. Some existing methods of investigating issues such as those raised above are applied and extended to correspondence analysis in Chapters Five and Six. Adaptions of them are proposed for biplots in Chapters Seven and Eight and for canonical correspondence analysis in Chapter Nine. Chapter Ten concludes the thesis.

APA, Harvard, Vancouver, ISO, and other styles

30

Zhou, Feifei, and 周飞飞. "Cure models for univariate and multivariate survival data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B45700977.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Nicolini, Olivier. "LIBS Multivariate Analysis with Machine Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286595.

Full text

Abstract:

Laser-Induced Breakdown Spectroscopy (LIBS) is a spectroscopic technique used for chemical analysis of materials. By analyzing the spectrum obtained with this technique it is possible to understand the chemical composition of a sample. The possibility to analyze materials in a contactless and online fashion, without sample preparation make LIBS one of the most interesting techniques for chemical composition analysis. However, despite its intrinsic advantages, LIBS analysis suffers from poor accuracy and limited reproducibility of the results due to interference effects caused by the chemical composition of the sample or other experimental factors. How to improve the accuracy of the analysis by extracting useful information from LIBS high dimensionality data remains the main challenge of this technique. In the present work, with the purpose to propose a robust analysis method, I present a pipeline for multivariate regression on LIBS data composed of preprocessing, feature selection, and regression. First raw data is preprocessed by application of intensity filtering, normalization and baseline correction to mitigate the effect of interference factors such as laser energy fluctuations or the presence of baseline in the spectrum. Feature selection allows finding the most informative lines for an element that are then used as input in the subsequent regression phase to predict the element concentration. Partial Least Squares (PLS) and Elastic Net showed the best predictive ability among the regression methods investigated, while Interval PLS (iPLS) and Iterative Predictor Weighting PLS (IPW-PLS) proved to be the best feature selection algorithms for this type of data. By applying these feature selection algorithms on the full LIBS spectrum before regression with PLS or Elastic Net it is possible to get accurate predictions in a robust fashion.
Laser-Induced Breakdown Spectroscopy (LIBS) är en spektroskopisk teknik som används för kemisk analys av material. Genom att analysera det spektrum som erhållits med denna teknik är det möjligt att förstå den kemiska sammansättningen av ett prov. Möjligheten att analysera material på ett kontaktlöst och online sätt utan förberedelse av prov gör LIBS till en av de mest intressanta teknikerna för kemisk sammansättning analys. Trots dess inneboende fördelar lider LIBS-analysen av dålig noggrannhet och begränsad reproducerbarhet av resultaten på grund av interferenseffekter orsakade av provets kemiska sammansättning eller andra experimentella faktorer. Hur man kan förbättra analysens noggrannhet genom att extrahera användbar information från LIBS-data med hög dimensionering är fortfarande den största utmaningen med denna teknik. I det nuvarande arbetet, med syftet att föreslå en robust analysmetod, presenterar jag en pipeline för multivariat regression på LIBS-data som består av förbehandling, val av funktioner och regression. Första rådata förbehandlas genom tillämpning av intensitetsfiltrering, normalisering och baslinjekorrektion för att mildra effekten av interferensfaktorer såsom laserens energifluktuationer eller närvaron av baslinjen i spektrumet. Funktionsval gör det möjligt att hitta de mest informativa linjerna för ett element som sedan används som input i den efterföljande regressionsfasen för att förutsäga elementkoncentrationen. Partial Least Squares (PLS) och Elastic Net visade den bästa förutsägelseförmågan bland de undersökta regressionsmetoderna, medan Interval PLS (iPLS) och Iterative PredictorWeighting PLS (IPW-PLS) visade sig vara de bästa funktionsval algoritmerna för denna typ av data. Genom att tillämpa dessa funktionsval algoritmer på hela LIBS-spektrumet före regression med PLS eller Elastic Net är det möjligt att få exakta förutsägelser på ett robust sätt.

APA, Harvard, Vancouver, ISO, and other styles

32

Ehlers, Rene. "Maximum likelihood estimation procedures for categorical data." Pretoria : [s.n.], 2002. http://upetd.up.ac.za/thesis/available/etd-07222005-124541.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Cai, Jianwen. "Generalized estimating equations for censored multivariate failure time data /." Thesis, Connect to this title online; UW restricted, 1992. http://hdl.handle.net/1773/9581.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Nothnagel, Carien. "Multivariate data analysis using spectroscopic data of fluorocarbon alcohol mixtures / Nothnagel, C." Thesis, North-West University, 2012. http://hdl.handle.net/10394/7064.

Full text

Abstract:

Pelchem, a commercial subsidiary of Necsa (South African Nuclear Energy Corporation), produces a range of commercial fluorocarbon products while driving research and development initiatives to support the fluorine product portfolio. One such initiative is to develop improved analytical techniques to analyse product composition during development and to quality assure produce. Generally the C–F type products produced by Necsa are in a solution of anhydrous HF, and cannot be directly analyzed with traditional techniques without derivatisation. A technique such as vibrational spectroscopy, that can analyze these products directly without further preparation, will have a distinct advantage. However, spectra of mixtures of similar compounds are complex and not suitable for traditional quantitative regression analysis. Multivariate data analysis (MVA) can be used in such instances to exploit the complex nature of spectra to extract quantitative information on the composition of mixtures. A selection of fluorocarbon alcohols was made to act as representatives for fluorocarbon compounds. Experimental design theory was used to create a calibration range of mixtures of these compounds. Raman and infrared (NIR and ATR–IR) spectroscopy were used to generate spectral data of the mixtures and this data was analyzed with MVA techniques by the construction of regression and prediction models. Selected samples from the mixture range were chosen to test the predictive ability of the models. Analysis and regression models (PCR, PLS2 and PLS1) gave good model fits (R2 values larger than 0.9). Raman spectroscopy was the most efficient technique and gave a high prediction accuracy (at 10% accepted standard deviation), provided the minimum mass of a component exceeded 16% of the total sample. The infrared techniques also performed well in terms of fit and prediction. The NIR spectra were subjected to signal saturation as a result of using long path length sample cells. This was shown to be the main reason for the loss in efficiency of this technique compared to Raman and ATR–IR spectroscopy. It was shown that multivariate data analysis of spectroscopic data of the selected fluorocarbon compounds could be used to quantitatively analyse mixtures with the possibility of further optimization of the method. The study was a representative study indicating that the combination of MVA and spectroscopy can be used successfully in the quantitative analysis of other fluorocarbon compound mixtures.
Thesis (M.Sc. (Chemistry))--North-West University, Potchefstroom Campus, 2012.

APA, Harvard, Vancouver, ISO, and other styles

35

陳志昌 and Chee-cheong Chan. "Compositional data analysis of voting patterns." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1993. http://hub.hku.hk/bib/B31977236.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Chan, Chee-cheong. "Compositional data analysis of voting patterns." [Hong Kong : University of Hong Kong], 1993. http://sunzi.lib.hku.hk/hkuto/record.jsp?B13787160.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Wang, Lianming. "Statistical analysis of multivariate interval-censored failure time data." Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/4375.

Full text

Abstract:

Thesis (Ph.D.)--University of Missouri-Columbia, 2006.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (May 2, 2007) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

38

Ahmadi-Nedushan, Behrooz 1966. "Multivariate statistical analysis of monitoring data for concrete dams." Thesis, McGill University, 2002. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=82815.

Full text

Abstract:

Major dams in the world are often instrumented in order to validate numerical models, to gain insight into the behavior of the dam, to detect anomalies, and to enable a timely response either in the form of repairs, reservoir management, or evacuation. Advances in automated data monitoring system makes it possible to regularly collect data on a large number of instruments for a dam. Managing this data is a major concern since traditional means of monitoring each instrument are time consuming and personnel intensive. Among tasks that need to be performed are: identification of faulty instruments, removal of outliers, data interpretation, model fitting and management of alarms for detecting statistically significant changes in the response of a dam.
Statistical models such as multiple linear regression, and back propagation neural networks have been used to estimate the response of individual instruments. Multiple linear regression models are of two kinds, (1) Hydro-Seasonal-Time (HST) models and (2) models that consider concrete temperatures as predictors.
Univerariate, bivariate, and multivariate methods are proposed for the identification of anomalies in the instrumentation data. The source of these anomalies can be either bad readings, faulty instruments, or changes in dam behavior.
The proposed methodologies are applied to three different dams, Idukki, Daniel Johnson and Chute-a-Caron, which are respectively an arch, multiple arch and a gravity dam. Displacements, strains, flow rates, and crack openings of these three dams are analyzed.
This research also proposes various multivariate statistical analyses and artificial neural networks techniques to analyze dam monitoring data. One of these methods, Principal Component Analysis (PCA) is concerned with explaining the variance-covariance structure of a data set through a few linear combinations of the original variables. The general objectives are (1) data reduction and (2) data interpretation. Other multivariate analysis methods such as canonical correlation analysis, partial least squares and nonlinear principal component analysis are discussed. The advantages of methodologies for noise reduction, the reduction of number of variables that have to be monitored, the prediction of response parameters, and the identification of faulty readings are discussed. Results indicated that dam responses are generally correlated and that only a few principal components can summarize the behavior of a dam.

APA, Harvard, Vancouver, ISO, and other styles

39

Das, Mitali. "Motion within music : the analysis of multivariate MIDI data." Thesis, University of York, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.367466.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Sheppard, Therese. "Extending covariance structure analysis for multivariate and functional data." Thesis, University of Manchester, 2010. https://www.research.manchester.ac.uk/portal/en/theses/extending-covariance-structure-analysis-for-multivariate-and-functional-data(e2ad7f12-3783-48cf-b83c-0ca26ef77633).html.

Full text

Abstract:

For multivariate data, when testing homogeneity of covariance matrices arising from two or more groups, Bartlett's (1937) modified likelihood ratio test statistic is appropriate to use under the null hypothesis of equal covariance matrices where the null distribution of the test statistic is based on the restrictive assumption of normality. Zhang and Boos (1992) provide a pooled bootstrap approach when the data cannot be assumed to be normally distributed. We give three alternative bootstrap techniques to testing homogeneity of covariance matrices when it is both inappropriate to pool the data into one single population as in the pooled bootstrap procedure and when the data are not normally distributed. We further show that our alternative bootstrap methodology can be extended to testing Flury's (1988) hierarchy of covariance structure models. Where deviations from normality exist, we show, by simulation, that the normal theory log-likelihood ratio test statistic is less viable compared with our bootstrap methodology. For functional data, Ramsay and Silverman (2005) and Lee et al (2002) together provide four computational techniques for functional principal component analysis (PCA) followed by covariance structure estimation. When the smoothing method for smoothing individual profiles is based on using least squares cubic B-splines or regression splines, we find that the ensuing covariance matrix estimate suffers from loss of dimensionality. We show that ridge regression can be used to resolve this problem, but only for the discretisation and numerical quadrature approaches to estimation, and that choice of a suitable ridge parameter is not arbitrary. We further show the unsuitability of regression splines when deciding on the optimal degree of smoothing to apply to individual profiles. To gain insight into smoothing parameter choice for functional data, we compare kernel and spline approaches to smoothing individual profiles in a nonparametric regression context. Our simulation results justify a kernel approach using a new criterion based on predicted squared error. We also show by simulation that, when taking account of correlation, a kernel approach using a generalized cross validatory type criterion performs well. These data-based methods for selecting the smoothing parameter are illustrated prior to a functional PCA on a real data set.

APA, Harvard, Vancouver, ISO, and other styles

41

Chen, Man-Hua. "Statistical analysis of multivariate interval-censored failure time data." Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/4776.

Full text

Abstract:

Thesis (Ph.D.)--University of Missouri-Columbia, 2007.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on March 6, 2009) Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

42

Edberg, Alexandra. "Monitoring Kraft Recovery Boiler Fouling by Multivariate Data Analysis." Thesis, KTH, Skolan för kemi, bioteknologi och hälsa (CBH), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230906.

Full text

Abstract:

This work deals with fouling in the recovery boiler at Montes del Plata, Uruguay. Multivariate data analysis has been used to analyze the large amount of data that was available in order to investigate how different parameters affect the fouling problems. Principal Component Analysis (PCA) and Partial Least Square Projection (PLS) have in this work been used. PCA has been used to compare average values between time periods with high and low fouling problems while PLS has been used to study the correlation structures between the variables and consequently give an indication of which parameters that might be changed to improve the availability of the boiler. The results show that this recovery boiler tends to have problems with fouling that might depend on the distribution of air, the black liquor pressure or the dry solid content of the black liquor. The results also show that multivariate data analysis is a powerful tool for analyzing these types of fouling problems.
Detta arbete handlar om inkruster i sodapannan pa Montes del Plata, Uruguay. Multivariat dataanalys har anvands for att analysera den stora datamangd som fanns tillganglig for att undersoka hur olika parametrar paverkar inkrusterproblemen. Principal·· Component Analysis (PCA) och Partial Least Square Projection (PLS) har i detta jobb anvants. PCA har anvants for att jamfora medelvarden mellan tidsperioder med hoga och laga inkrusterproblem medan PLS har anvants for att studera korrelationen mellan variablema och darmed ge en indikation pa vilka parametrar som kan tankas att andras for att forbattra tillgangligheten pa sodapannan. Resultaten visar att sodapannan tenderar att ha problem med inkruster som kan hero pa fdrdelningen av luft, pa svartlutens tryck eller pa torrhalten i svartluten. Resultaten visar ocksa att multivariat dataanalys ar ett anvandbart verktyg for att analysera dessa typer av inkrusterproblem.

APA, Harvard, Vancouver, ISO, and other styles

43

Chang, Janis. "Analysis of ordered categorical data." Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/27857.

Full text

Abstract:

Methods of testing for a location shift between two populations in a longitudinal study are investigated when the data of interest are ordered, categorical and non-linear. A non-standard analysis involving modelling of data over time with transition probability matrices is discussed. Next, the relative efficiencies of statistics more frequently used for the analysis of such categorical data at a single time point are examined. The Wilcoxon rank sum, McCullagh, and 2 sample t statistic are compared for the analysis of such cross sectional data using simulation and efficacy calculations. Simulation techniques are then utilized in comparing the stratified Wilcoxon, McCullagh and chi squared-type statistic in their efficiencies at detecting a location shift when the data are examined over two time points. The distribution of a chi squared-type statistic based on the simple contingency table constructed by merely noting whether a subject improved, stayed the same or deteriorated is derived. Applications of these methods and results to a data set of Multiple Sclerosis patients, some of whom were treated with interferon and some of whom received a placebo are provided throughout the thesis and our findings are summarized in the last Chapter.
Science, Faculty of
Statistics, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

44

Wan, Chung-him, and 溫仲謙. "Analysis of zero-inflated count data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B43703719.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Wan, Chung-him. "Analysis of zero-inflated count data." Click to view the E-thesis via HKUTO, 2009. http://sunzi.lib.hku.hk/hkuto/record/B43703719.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Fitzgerald-DeHoog, Lindsay M. "Multivariate analysis of proteomic data| Functional group analysis using a global test." Thesis, California State University, Long Beach, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1602759.

Full text

Abstract:

Proteomics is a relatively new discipline being implemented in life science fields. Proteomics allows a whole-systems approach to discerning changes in organismal physiology due to physical perturbations. The advantages of a proteomic approach may be counteracted by the ability to analyze the data in a meaningful way due to inherent problems with statistical assumptions. Furthermore, analyzing significant protein volume differences among treatment groups often requires analysis of numerous proteins even when limiting analyses to a particular protein type or physiological pathway. Improper use of traditional techniques leads to problems with multiple hypotheses testing.

This research will examine two common techniques used to analyze proteomic data and will apply these to a novel proteomic data set. In addition, a Global Test originally developed for gene array data will be employed to discover its utility for proteomic data and the ability to counteract the multiple hypotheses testing problems encountered with traditional analyses.

APA, Harvard, Vancouver, ISO, and other styles

47

Kurtovic, Sanela. "Directed Evolution of Glutathione Transferases Guided by Multivariate Data Analysis." Doctoral thesis, Uppsala University, Department of Biochemistry and Organic Chemistry, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8718.

Full text

Abstract:

Evolution of enzymes with novel functional properties has gained much attention in recent years. Naturally evolved enzymes are adapted to work in living cells under physiological conditions, circumstances that are not always available for industrial processes calling for novel and better catalysts. Furthermore, altering enzyme function also affords insight into how enzymes work and how natural evolution operates.

Previous investigations have explored catalytic properties in the directed evolution of mutant libraries with high sequence variation. Before this study was initiated, functional analysis of mutant libraries was, to a large extent, restricted to uni- or bivariate methods. Consequently, there was a need to apply multivariate data analysis (MVA) techniques in this context. Directed evolution was approached by DNA shuffling of glutathione transferases (GSTs) in this thesis. GSTs are multifarious enzymes that have detoxication of both exo- and endogenous compounds as their primary function. They catalyze the nucleophilic attack by the tripeptide glutathione on many different electrophilic substrates.

Several multivariate analysis tools, e.g. principal component (PC), hierarchical cluster, and K-means cluster analyses, were applied to large mutant libraries assayed with a battery of GST substrates. By this approach, evolvable units (quasi-species) fit for further evolution were identified. It was clear that different substrates undergoing different kinds of chemical transformation can group together in a multi-dimensional substrate-activity space, thus being responsible for a certain quasi-species cluster. Furthermore, the importance of the chemical environment, or substrate matrix, in enzyme evolution was recognized. Diverging substrate selectivity profiles among homologous enzymes acting on substrates performing the same kind of chemistry were identified by MVA. Important structure-function activity relationships with the prodrug azathioprine were elucidated by segment analysis of a shuffled GST mutant library. Together, these results illustrate important methods applied to molecular enzyme evolution.

APA, Harvard, Vancouver, ISO, and other styles

48

Stenlund, Hans. "Improving interpretation by orthogonal variation : Multivariate analysis of spectroscopic data." Doctoral thesis, Umeå universitet, Kemiska institutionen, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-43476.

Full text

Abstract:

The desire to use the tools and concepts of chemometrics when studying problems in the life sciences, especially biology and medicine, has prompted chemometricians to shift their focus away from their field‘s traditional emphasis on model predictivity and towards the more contemporary objective of optimizing information exchange via model interpretation. The complex data structures that are captured by modern advanced analytical instruments open up new possibilities for extracting information from complex data sets. This in turn imposes higher demands on the quality of data and the modeling techniques used. The introduction of the concept of orthogonal variation in the late 1990‘s led to a shift of focus within chemometrics; the information gained from analysis of orthogonal structures complements that obtained from the predictive structures that were the discipline‘s previous focus. OPLS, which was introduced in the beginning of 2000‘s, refined this view by formalizing the model structure and the separation of orthogonal variations. Orthogonal variation stems from experimental/analytical issues such as time trends, process drift, storage, sample handling, and instrumental differences, or from inherent properties of the sample such as age, gender, genetics, and environmental influence. The usefulness and versatility of OPLS has been demonstrated in over 500 citations, mainly in the fields of metabolomics and transcriptomics but also in NIR, UV and FTIR spectroscopy. In all cases, the predictive precision of OPLS is identical to that of PLS, but OPLS is superior when it comes to the interpretation of both predictive and orthogonal variation. Thus, OPLS models the same data structures but provides increased scope for interpretation, making it more suitable for contemporary applications in the life sciences. This thesis discusses four different research projects, including analyses of NIR, FTIR and NMR spectroscopic data. The discussion includes comparisons of OPLS and PLS models of complex datasets in which experimental variation conceals and confounds relevant information. The PLS and OPLS methods are discussed in detail. In addition, the thesis describes new OPLS-based methods developed to accommodate hyperspectral images for supervised modeling. Proper handling of orthogonal structures revealed the weaknesses in the analytical chains examined. In all of the studies described, the orthogonal structures were used to validate the quality of the generated models as well as gaining new knowledge. These aspects are crucial in order to enhance the information exchange from both past and future studies.

APA, Harvard, Vancouver, ISO, and other styles

49

Combrexelle, Sébastien. "Multifractal analysis for multivariate data with application to remote sensing." Phd thesis, Toulouse, INPT, 2016. http://oatao.univ-toulouse.fr/16477/1/Combrexelle.pdf.

Full text

Abstract:

Texture characterization is a central element in many image processing applications. Texture analysis can be embedded in the mathematical framework of multifractal analysis, enabling the study of the fluctuations in regularity of image intensity and providing practical tools for their assessment, the coefficients or wavelet leaders. Although successfully applied in various contexts, multi fractal analysis suffers at present from two major limitations. First, the accurate estimation of multifractal parameters for image texture remains a challenge, notably for small sample sizes. Second, multifractal analysis has so far been limited to the analysis of a single image, while the data available in applications are increasingly multivariate. The main goal of this thesis is to develop practical contributions to overcome these limitations. The first limitation is tackled by introducing a generic statistical model for the logarithm of wavelet leaders, parametrized by multifractal parameters of interest. This statistical model enables us to counterbalance the variability induced by small sample sizes and to embed the estimation in a Bayesian framework. This yields robust and accurate estimation procedures, effective both for small and large images. The multifractal analysis of multivariate images is then addressed by generalizing this Bayesian framework to hierarchical models able to account for the assumption that multifractal properties evolve smoothly in the dataset. This is achieved via the design of suitable priors relating the dynamical properties of the multifractal parameters of the different components composing the dataset. Different priors are investigated and compared in this thesis by means of numerical simulations conducted on synthetic multivariate multifractal images. This work is further completed by the investigation of the potential benefit of multifractal analysis and the proposed Bayesian methodology for remote sensing via the example of hyperspectral imaging.

APA, Harvard, Vancouver, ISO, and other styles

50

Duchesne, Carl. "Improvement of processes and product quality through multivariate data analysis /." *McMaster only, 2000.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Explorative multivariate data analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles