Dissertations / Theses on the topic 'Gaussian Regression Processes'

To see the other types of publications on this topic, follow the link: Gaussian Regression Processes.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Gaussian Regression Processes.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Beck, Daniel Emilio. "Gaussian processes for text regression." Thesis, University of Sheffield, 2017. http://etheses.whiterose.ac.uk/17619/.

Full text
Abstract:
Text Regression is the task of modelling and predicting numerical indicators or response variables from textual data. It arises in a range of different problems, from sentiment and emotion analysis to text-based forecasting. Most models in the literature apply simple text representations such as bag-of-words and predict response variables in the form of point estimates. These simplifying assumptions ignore important information coming from the data such as the underlying uncertainty present in the outputs and the linguistic structure in the textual inputs. The former is particularly important when the response variables come from human annotations while the latter can capture linguistic phenomena that go beyond simple lexical properties of a text. In this thesis our aim is to advance the state-of-the-art in Text Regression by improving these two aspects, better uncertainty modelling in the response variables and improved text representations. Our main workhorse to achieve these goals is Gaussian Processes (GPs), a Bayesian kernelised probabilistic framework. GP-based regression models the response variables as well-calibrated probability distributions, providing additional information in predictions which in turn can improve subsequent decision making. They also model the data using kernels, enabling richer representations based on similarity measures between texts. To be able to reach our main goals we propose new kernels for text which aim at capturing richer linguistic information. These kernels are then parameterised and learned from the data using efficient model selection procedures that are enabled by the GP framework. Finally we also capitalise on recent advances in the GP literature to better capture uncertainty in the response variables, such as multi-task learning and models that can incorporate non-Gaussian variables through the use of warping functions. Our proposed architectures are benchmarked in two Text Regression applications: Emotion Analysis and Machine Translation Quality Estimation. Overall we are able to obtain better results compared to baselines while also providing uncertainty estimates for predictions in the form of posterior distributions. Furthermore we show how these models can be probed to obtain insights about the relation between the data and the response variables and also how to apply predictive distributions in subsequent decision making procedures.
APA, Harvard, Vancouver, ISO, and other styles
2

Gibbs, M. N. "Bayesian Gaussian processes for regression and classification." Thesis, University of Cambridge, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.599379.

Full text
Abstract:
Bayesian inference offers us a powerful tool with which to tackle the problem of data modelling. However, the performance of Bayesian methods is crucially dependent on being able to find good models for our data. The principal focus of this thesis is the development of models based on Gaussian process priors. Such models, which can be thought of as the infinite extension of several existing finite models, have the flexibility to model complex phenomena while being mathematically simple. In this thesis, I present a review of the theory of Gaussian processes and their covariance functions and demonstrate how they fit into the Bayesian framework. The efficient implementation of a Gaussian process is discussed with particular reference to approximate methods for matrix inversion based on the work of Skilling (1993). Several regression problems are examined. Non-stationary covariance functions are developed for the regression of neuron spike data and the use of Gaussian processes to model the potential energy surfaces of weakly bound molecules is discussed. Classification methods based on Gaussian processes are implemented using variational methods. Existing bounds (Jaakkola and Jordan 1996) for the sigmoid function are used to tackle binary problems and multi-dimensional bounds on the softmax function are presented for the multiple class case. The performance of the variational classifier is compared with that of other methods using the CRABS and PIMA datasets (Ripley 1996) and the problem of predicting the cracking of welds based on their chemical composition is also investigated. The theoretical calculation of the density of states of crystal structures is discussed in detail. Three possible approaches to the problem are described based on free energy minimization, Gaussian processes and the theory of random matrices. Results from these approaches are compared with the state-of-the-art techniques (Pickard 1997).
APA, Harvard, Vancouver, ISO, and other styles
3

Wågberg, Johan, and Viklund Emanuel Walldén. "Continuous Occupancy Mapping Using Gaussian Processes." Thesis, Linköpings universitet, Reglerteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-81464.

Full text
Abstract:
The topic of this thesis is occupancy mapping for mobile robots, with an emphasis on a novel method for continuous occupancy mapping using Gaussian processes. In the new method, spatial correlation is accounted for in a natural way, and an a priori discretization of the area to be mapped is not necessary as within most other common methods. The main contribution of this thesis is the construction of a Gaussian process library for C++, and the use of this library to implement the continuous occupancy mapping algorithm. The continuous occupancy mapping is evaluated using both simulated and real world experimental data. The main result is that the method, in its current form, is not fit for online operations due to its computational complexity. By using approximations and ad hoc solutions, the method can be run in real time on a mobile robot, though not without losing many of its benefits.
APA, Harvard, Vancouver, ISO, and other styles
4

Davies, Alexander James. "Effective implementation of Gaussian process regression for machine learning." Thesis, University of Cambridge, 2015. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.708909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

余瑞心 and Sui-sum Amy Yu. "Application of Markov regression models in non-Gaussian time series analysis." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1991. http://hub.hku.hk/bib/B31976840.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Rasmussen, Carl Edward. "Evaluation of Gaussian processes and other methods for non-linear regression." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/nq28300.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Sun, Furong. "Some Advances in Local Approximate Gaussian Processes." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/97245.

Full text
Abstract:
Nowadays, Gaussian Process (GP) has been recognized as an indispensable statistical tool in computer experiments. Due to its computational complexity and storage demand, its application in real-world problems, especially in "big data" settings, is quite limited. Among many strategies to tailor GP to such settings, Gramacy and Apley (2015) proposed local approximate GP (laGP), which constructs approximate predictive equations by constructing small local designs around the predictive location under certain criterion. In this dissertation, several methodological extensions based upon laGP are proposed. One methodological contribution is the multilevel global/local modeling, which deploys global hyper-parameter estimates to perform local prediction. The second contribution comes from extending the laGP notion of "locale" to a set of predictive locations, along paths in the input space. These two contributions have been applied in the satellite drag emulation, which is illustrated in Chapter 3. Furthermore, the multilevel GP modeling strategy has also been applied to synthesize field data and computer model outputs of solar irradiance across the continental United States, combined with inverse-variance weighting, which is detailed in Chapter 4. Last but not least, in Chapter 5, laGP's performance has been tested on emulating daytime land surface temperatures estimated via satellites, in the settings of irregular grid locations.
Doctor of Philosophy
In many real-life settings, we want to understand a physical relationship/phenomenon. Due to limited resources and/or ethical reasons, it is impossible to perform physical experiments to collect data, and therefore, we have to rely upon computer experiments, whose evaluation usually requires expensive simulation, involving complex mathematical equations. To reduce computational efforts, we are looking for a relatively cheap alternative, which is called an emulator, to serve as a surrogate model. Gaussian process (GP) is such an emulator, and has been very popular due to fabulous out-of-sample predictive performance and appropriate uncertainty quantification. However, due to computational complexity, full GP modeling is not suitable for “big data” settings. Gramacy and Apley (2015) proposed local approximate GP (laGP), the core idea of which is to use a subset of the data for inference and further prediction at unobserved inputs. This dissertation provides several extensions of laGP, which are applied to several real-life “big data” settings. The first application, detailed in Chapter 3, is to emulate satellite drag from large simulation experiments. A smart way is figured out to capture global input information in a comprehensive way by using a small subset of the data, and local prediction is performed subsequently. This method is called “multilevel GP modeling”, which is also deployed to synthesize field measurements and computational outputs of solar irradiance across the continental United States, illustrated in Chapter 4, and to emulate daytime land surface temperatures estimated by satellites, discussed in Chapter 5.
APA, Harvard, Vancouver, ISO, and other styles
8

Zertuche, Federico. "Utilisation de simulateurs multi-fidélité pour les études d'incertitudes dans les codes de caclul." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM069/document.

Full text
Abstract:
Les simulations par ordinateur sont un outil de grande importance pour les mathématiciens appliqués et les ingénieurs. Elles sont devenues plus précises mais aussi plus compliquées. Tellement compliquées, que le temps de lancement par calcul est prohibitif. Donc, plusieurs aspects de ces simulations sont mal compris. Par exemple, souvent ces simulations dépendent des paramètres qu'ont une valeur inconnue.Un metamodèle est une reconstruction de la simulation. Il produit des réponses proches à celles de la simulation avec un temps de calcul très réduit. Avec ce metamodèle il est possible d'étudier certains aspects de la simulation. Il est construit avec peu de données et son objectif est de remplacer la simulation originale.Ce travail est concerné avec la construction des metamodèles dans un cadre particulier appelé multi-fidélité. En multi-fidélité, le metamodèle est construit à partir des données produites par une simulation objective et des données qu'ont une relation avec cette simulation. Ces données approximées peuvent être générés par des versions dégradées de la simulation ; par des anciennes versions qu'ont été largement étudiées ou par une autre simulation dans laquelle une partie de la description est simplifiée.En apprenant la différence entre les données il est possible d'incorporer l'information approximée et ce ci peut nous conduire vers un metamodèle amélioré. Deux approches pour atteindre ce but sont décrites dans ce manuscrit : la première est basée sur des modèles avec des processus gaussiens et la seconde sur une décomposition à base d'ondelettes. La première montre qu'en estimant la relation il est possible d'incorporer des données qui n'ont pas de valeur autrement. Dans la seconde, les données sont ajoutées de façon adaptative pour améliorer le metamodèle.L'objet de ce travail est d'améliorer notre compréhension sur comment incorporer des données approximées pour produire des metamodèles plus précis. Travailler avec un metamodèle multi-fidélité nous aide à comprendre en détail ces éléments. A la fin une image globale des parties qui forment ce metamodèle commence à s'esquisser : les relations et différences entres les données deviennent plus claires
A very important tool used by applied mathematicians and engineers to model the behavior of a system are computer simulations. They have become increasingly more precise but also more complicated. So much, that they are very slow to produce an output and thus difficult to sample so that many aspects of these simulations are not very well understood. For example, in many cases they depend on parameters whose value isA metamodel is a reconstruction of the simulation. It requires much less time to produce an output that is close to what the simulation would. By using it, some aspects of the original simulation can be studied. It is built with very few samples and its purpose is to replace the simulation.This thesis is concerned with the construction of a metamodel in a particular context called multi-fidelity. In multi-fidelity the metamodel is constructed using the data from the target simulation along other samples that are related. These approximate samples can come from a degraded version of the simulation; an old version that has been studied extensively or a another simulation in which a part of the description is simplified.By learning the difference between the samples it is possible to incorporate the information of the approximate data and this may lead to an enhanced metamodel. In this manuscript two approaches that do this are studied: one based on Gaussian process modeling and another based on a coarse to fine Wavelet decomposition. The fist method shows how by estimating the relationship between two data sets it is possible to incorporate data that would be useless otherwise. In the second method an adaptive procedure to add data systematically to enhance the metamodel is proposed.The object of this work is to better our comprehension of how to incorporate approximate data to enhance a metamodel. Working with a multi-fidelity metamodel helps us to understand in detail the data that nourish it. At the end a global picture of the elements that compose it is formed: the relationship and the differences between all the data sets become clearer
APA, Harvard, Vancouver, ISO, and other styles
9

Wikland, Love. "Early-Stage Prediction of Lithium-Ion Battery Cycle Life Using Gaussian Process Regression." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273619.

Full text
Abstract:
Data-driven prediction of battery health has gained increased attention over the past couple of years, in both academia and industry. Accurate early-stage predictions of battery performance would create new opportunities regarding production and use. Using data from only the first 100 cycles, in a data set of 124 cells where lifetimes span between 150 and 2300 cycles, this work combines parametric linear models with non-parametric Gaussian process regression to achieve cycle lifetime predictions with an overall accuracy of 8.8% mean error. This work presents a relevant contribution to current research as this combination of methods is previously unseen when regressing battery lifetime on a high dimensional feature space. The study and the results presented further show that Gaussian process regression can serve as a valuable contributor in future data-driven implementations of battery health predictions.
Datadriven prediktion av batterihälsa har fått ökad uppmärksamhet under de senaste åren, både inom akademin och industrin. Precisa prediktioner i tidigt stadium av batteriprestanda skulle kunna skapa nya möjligheter för produktion och användning. Genom att använda data från endast de första 100 cyklerna, i en datamängd med 124 celler där livslängden sträcker sig mellan 150 och 2300 cykler, kombinerar denna uppsats parametriska linjära modeller med ickeparametrisk Gaussisk processregression för att uppnå livstidsprediktioner med en genomsnittlig noggrannhet om 8.8% fel. Studien utgör ett relevant bidrag till den aktuella forskningen eftersom den använda kombinationen av metoder inte tidigare utnyttjats för regression av batterilivslängd med ett högdimensionellt variabelrum. Studien och de erhållna resultaten visar att regression med hjälp av Gaussiska processer kan bidra i framtida datadrivna implementeringar av prediktion för batterihälsa.
APA, Harvard, Vancouver, ISO, and other styles
10

Persson, Lejon Ludvig, and Fredrik Berntsson. "Regression Analysis on NBA Players Background and Performance using Gaussian Processes : Can NBA-drafts be improved by taking socioeconomic background into consideration?" Thesis, KTH, Skolan för teknikvetenskap (SCI), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-153767.

Full text
Abstract:
In the modern society it is well known that an individual’s background matters in his career, but should it be taken into consideration in a recruiting process in general and a recruiting process of NBA-players in particular? Previous research shows that white basketball players from high-income families have a 75% higher chance of becoming an NBA player compared to a white basketball player from a low-income family. In this paper, we have examined whether there is a connection between NBA-player background and the chances of succeeding in the NBA given that the player has been picked in the NBA-draft. The results have been carried out using machine learning algorithms based on Gaussian Processes. The results show that draft decisions will not be improved by taking socio-economic background into consideration.
APA, Harvard, Vancouver, ISO, and other styles
11

Liu, Xuyuan. "Statistical validation and calibration of computer models." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/39478.

Full text
Abstract:
This thesis deals with modeling, validation and calibration problems in experiments of computer models. Computer models are mathematic representations of real systems developed for understanding and investigating the systems. Before a computer model is used, it often needs to be validated by comparing the computer outputs with physical observations and calibrated by adjusting internal model parameters in order to improve the agreement between the computer outputs and physical observations. As computer models become more powerful and popular, the complexity of input and output data raises new computational challenges and stimulates the development of novel statistical modeling methods. One challenge is to deal with computer models with random inputs (random effects). This kind of computer models is very common in engineering applications. For example, in a thermal experiment in the Sandia National Lab (Dowding et al. 2008), the volumetric heat capacity and thermal conductivity are random input variables. If input variables are randomly sampled from particular distributions with unknown parameters, the existing methods in the literature are not directly applicable. The reason is that integration over the random variable distribution is needed for the joint likelihood and the integration cannot always be expressed in a closed form. In this research, we propose a new approach which combines the nonlinear mixed effects model and the Gaussian process model (Kriging model). Different model formulations are also studied to have an better understanding of validation and calibration activities by using the thermal problem. Another challenge comes from computer models with functional outputs. While many methods have been developed for modeling computer experiments with single response, the literature on modeling computer experiments with functional response is sketchy. Dimension reduction techniques can be used to overcome the complexity problem of function response; however, they generally involve two steps. Models are first fit at each individual setting of the input to reduce the dimensionality of the functional data. Then the estimated parameters of the models are treated as new responses, which are further modeled for prediction. Alternatively, pointwise models are first constructed at each time point and then functional curves are fit to the parameter estimates obtained from the fitted models. In this research, we first propose a functional regression model to relate functional responses to both design and time variables in one single step. Secondly, we propose a functional kriging model which uses variable selection methods by imposing a penalty function. we show that the proposed model performs better than dimension reduction based approaches and the kriging model without regularization. In addition, non-asymptotic theoretical bounds on the estimation error are presented.
APA, Harvard, Vancouver, ISO, and other styles
12

Linton, Thomas. "Forecasting hourly electricity consumption for sets of households using machine learning algorithms." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186592.

Full text
Abstract:
To address inefficiency, waste, and the negative consequences of electricity generation, companies and government entities are looking to behavioural change among residential consumers. To drive behavioural change, consumers need better feedback about their electricity consumption. A monthly or quarterly bill provides the consumer with almost no useful information about the relationship between their behaviours and their electricity consumption. Smart meters are now widely dispersed in developed countries and they are capable of providing electricity consumption readings at an hourly resolution, but this data is mostly used as a basis for billing and not as a tool to assist the consumer in reducing their consumption. One component required to deliver innovative feedback mechanisms is the capability to forecast hourly electricity consumption at the household scale. The work presented by this thesis is an evaluation of the effectiveness of a selection of kernel based machine learning methods at forecasting the hourly aggregate electricity consumption for different sized sets of households. The work of this thesis demonstrates that k-Nearest Neighbour Regression and Gaussian process Regression are the most accurate methods within the constraints of the problem considered. In addition to accuracy, the advantages and disadvantages of each machine learning method are evaluated, and a simple comparison of each algorithms computational performance is made.
För att ta itu med ineffektivitet, avfall, och de negativa konsekvenserna av elproduktion så vill företag och myndigheter se beteendeförändringar bland hushållskonsumenter. För att skapa beteendeförändringar så behöver konsumenterna bättre återkoppling när det gäller deras elförbrukning. Den nuvarande återkopplingen i en månads- eller kvartalsfaktura ger konsumenten nästan ingen användbar information om hur deras beteenden relaterar till deras konsumtion. Smarta mätare finns nu överallt i de utvecklade länderna och de kan ge en mängd information om bostäders konsumtion, men denna data används främst som underlag för fakturering och inte som ett verktyg för att hjälpa konsumenterna att minska sin konsumtion. En komponent som krävs för att leverera innovativa återkopplingsmekanismer är förmågan att förutse elförbrukningen på hushållsskala. Arbetet som presenteras i denna avhandling är en utvärdering av noggrannheten hos ett urval av kärnbaserad maskininlärningsmetoder för att förutse den sammanlagda förbrukningen för olika stora uppsättningar av hushåll. Arbetet i denna avhandling visar att "k-Nearest Neighbour Regression" och "Gaussian Process Regression" är de mest exakta metoder inom problemets begränsningar. Förutom noggrannhet, så görs en utvärdering av fördelar, nackdelar och prestanda hos varje maskininlärningsmetod.
APA, Harvard, Vancouver, ISO, and other styles
13

Ashrafi, Parivash. "Predicting the absorption rate of chemicals through mammalian skin using machine learning algorithms." Thesis, University of Hertfordshire, 2016. http://hdl.handle.net/2299/17310.

Full text
Abstract:
Machine learning (ML) methods have been applied to the analysis of a range of biological systems. This thesis evaluates the application of these methods to the problem domain of skin permeability. ML methods offer great potential in both predictive ability and their ability to provide mechanistic insight to, in this case, the phenomena of skin permeation. Historically, refining mathematical models used to predict percutaneous drug absorption has been thought of as a key factor in this field. Quantitative Structure-Activity Relationships (QSARs) models are used extensively for this purpose. However, advanced ML methods successfully outperform the traditional linear QSAR models. In this thesis, the application of ML methods to percutaneous absorption are investigated and evaluated. The major approach used in this thesis is Gaussian process (GP) regression method. This research seeks to enhance the prediction performance by using local non-linear models obtained from applying clustering algorithms. In addition, to increase the model's quality, a kernel is generated based on both numerical chemical variables and categorical experimental descriptors. Monte Carlo algorithm is also employed to generate reliable models from variable data which is inevitable in biological experiments. The datasets used for this study are small and it may raise the over-fitting/under-fitting problem. In this research I attempt to find optimal values of skin permeability using GP optimisation algorithms within small datasets. Although these methods are applied here to the field of percutaneous absorption, it may be applied more broadly to any biological system.
APA, Harvard, Vancouver, ISO, and other styles
14

Kortesalmi, Linus. "Gaussian Process Regression-based GPS Variance Estimation and Trajectory Forecasting." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-153126.

Full text
Abstract:
Spatio-temporal data is a commonly used source of information. Using machine learning to analyse this kind of data can lead to many interesting and useful insights. In this thesis project, a novel public transportation spatio-temporal dataset is explored and analysed. The dataset contains 282 GB of positional events, spanning two weeks of time, from all public transportation vehicles in Östergötland county, Sweden.  From the data exploration, three high-level problems are formulated: bus stop detection, GPS variance estimation, and arrival time prediction, also called trajectory forecasting. The bus stop detection problem is briefly discussed and solutions are proposed. Gaussian process regression is an effective method for solving regression problems. The GPS variance estimation problem is solved via the use of a mixture of Gaussian processes. A mixture of Gaussian processes is also used to predict the arrival time for public transportation buses. The arrival time prediction is from one bus stop to the next, not for the whole trajectory.  The result from the arrival time prediction is a distribution of arrival times, which can easily be applied to determine the earliest and latest expected arrival to the next bus stop, alongside the most probable arrival time. The naïve arrival time prediction model implemented has a root mean square error of 5 to 19 seconds. In general, the absolute error of the prediction model decreases over time in each respective segment. The results from the GPS variance estimation problem is a model which can compare the variance for different environments along the route on a given trajectory.
APA, Harvard, Vancouver, ISO, and other styles
15

Coufal, Martin. "Hyper-optimalizace neuronových sítí založená na Gaussovských procesech." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-417223.

Full text
Abstract:
Cílem této diplomové práce je vytvoření nástroje pro optimalizaci hyper-parametrů umělých neuronových sítí. Tento nástroj musí být schopen optimalizovat více hyper-parametrů, které mohou být navíc i korelovány. Tento problém jsem vyřešil implmentací optimalizátoru, který využívá Gaussovské procesy k predikci vlivu jednotlivých hyperparametrů na výslednou přesnost neuronové sítě. Z provedených experimentů na několika benchmark funkcích jsem zjistil, že implementovaný nástroj je schopen dosáhnout lepších výsledků než optimalizátory založené na náhodném prohledávání a snížit tak v průměru počet potřebných kroků optimalizace. Optimalizace založená na náhodném prohledávání dosáhla lepších výsledků pouze v prvních krocích optimalizace, než si optimalizátor založený na Gaussovských procesech vytvoří dostatečně přesný model problému. Nicméně téměř všechny experimenty provedené na datasetu MNIST prokázaly lepší výsledky optimalizátoru založeného na náhodném prohledávání. Tyto rozdíly v provedených experimentech jsou pravděpodobně dány složitostí zvolených benchmark funkcí nebo zvolenými parametry implementovaného optimalizátoru.
APA, Harvard, Vancouver, ISO, and other styles
16

Le, Gratiet Loic. "Multi-fidelity Gaussian process regression for computer experiments." Phd thesis, Université Paris-Diderot - Paris VII, 2013. http://tel.archives-ouvertes.fr/tel-00866770.

Full text
Abstract:
This work is on Gaussian-process based approximation of a code which can be run at different levels of accuracy. The goal is to improve the predictions of a surrogate model of a complex computer code using fast approximations of it. A new formulation of a co-kriging based method has been proposed. In particular this formulation allows for fast implementation and for closed-form expressions for the predictive mean and variance for universal co-kriging in the multi-fidelity framework, which is a breakthrough as it really allows for the practical application of such a method in real cases. Furthermore, fast cross validation, sequential experimental design and sensitivity analysis methods have been extended to the multi-fidelity co-kriging framework. This thesis also deals with a conjecture about the dependence of the learning curve (ie the decay rate of the mean square error) with respect to the smoothness of the underlying function. A proof in a fairly general situation (which includes the classical models of Gaussian-process based metamodels with stationary covariance functions) has been obtained while the previous proofs hold only for degenerate kernels (ie when the process is in fact finite-dimensional). This result allows for addressing rigorously practical questions such as the optimal allocation of the budget between different levels of codes in the multi-fidelity framework.
APA, Harvard, Vancouver, ISO, and other styles
17

Raulli, Vittoria. "Processi gaussiani nella regressione." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/18256/.

Full text
Abstract:
L’apprendimento supervisionato è una tecnica di apprendimento automatico (machine learning) che si suddivide in tre passi: nel primo passo vengono raccolti dei dati riguardanti il fenomeno analizzato; nel secondo si costruisce il modello di previsione; nel terzo si applica il modello ottenuto su dei nuovi dati di input. A seconda delle caratteristiche degli output, l’apprendimento supervisionato si divide in regressione, per output continui, e classificazione, per output discreti. In questa tesi viene presentato il problema della regressione da un punto di vista Bayesiano; tuttavia questo approccio può richiedere costi computazionali molto elevati. Vedremo che i processi Gaussiani risulteranno essere una tecnica molto efficace per risolvere questo problema, sia dal punto di vista computazionale, che dal punto di vista della accuratezza. La tesi è divisa in 4 capitoli: nel primo vengono presentati i prerequisiti necessari per la lettura; nel secondo viene esposto l’approccio statistico Bayesiano; nel terzo vengono introdotti i processi Gaussiani e nel quarto viene proposto un algoritmo di simulazione.
APA, Harvard, Vancouver, ISO, and other styles
18

Molada, Tebar Adolfo. "Colorimetric and spectral analysis of rock art by means of the characterization of digital sensors." Doctoral thesis, Universitat Politècnica de València, 2021. http://hdl.handle.net/10251/160386.

Full text
Abstract:
[ES] Las labores de documentación de arte rupestre son arduas y delicadas, donde el color desempeña un papel fundamental, proporcionando información vital a nivel descriptivo, técnico y cuantitativo . Tradicionalmente los métodos de documentación en arqueología quedaban restringidos a procedimientos estrictamente subjetivos. Sin embargo, esta metodología conlleva limitaciones prácticas y técnicas, afectando a los resultados obtenidos en la determinación del color. El empleo combinado de técnicas geomáticas, como la fotogrametría o el láser escáner, junto con técnicas de procesamiento de imágenes digitales, ha supuesto un notable avance. El problema es que, aunque las imágenes digitales permiten capturar el color de forma rápida, sencilla, y no invasiva, los datos RGB registrados por la cámara no tienen un sentido colorimétrico riguroso. Se requiere la aplicación de un proceso riguroso de tranformación que permita obtener datos fidedignos del color a través de imágenes digitales. En esta tesis se propone una solución científica novedosa y de vanguardia, en la que se persigue integrar el análisis espectrofotométrico y colorimétrico como complemento a técnicas fotogramétricas que permitan una mejora en la identificación del color y representación de pigmentos con máxima fiabilidad en levantamientos, modelos y reconstrucciones tridimensionales (3D). La metodología propuesta se basa en la caracterización colorimétrica de sensores digitales, que es de novel aplicación en pinturas rupestres. La caracterización pretende obtener las ecuaciones de transformación entre los datos de color registrados por la cámara, dependientes del dispositivo, y espacios de color independientes, de base física, como los establecidos por la Commission Internationale de l'Éclairage (CIE). Para el tratamiento de datos colorimétricos y espectrales se requiere disponer de un software de características técnicas muy específicas. Aunque existe software comercial, lo cierto es que realizan por separado el tratamiento digital de imágenes y las operaciones colorimétricas. No existe software que integre ambas, ni que además permita llevar a cabo la caracterización. Como aspecto fundamental, presentamos en esta tesis el software propio desarrollado, denominado pyColourimetry, siguiendo las recomendaciones publicadas por la CIE, de código abierto, y adaptado al flujo metodológico propuesto, de modo que facilite la independencia y el progreso científico sin ataduras comerciales, permitiendo el tratamiento de datos colorimétricos y espectrales, y confiriendo al usuario pleno control del proceso y la gestión de los datos obtenidos. Adicinalmente, en este estudio se expone el análisis de los principales factores que afectan a la caracterización tales como el sensor empleado, los parámetros de la cámara durante la toma, la iluminación, el modelo de regresión, y el conjunto de datos empleados como entrenamiento del modelo. Se ha aplicado un modelo de regresión basado en procesos Gaussianos, y se ha comparado con los resultados obtenidos mediante polinomios. También presentamos un nuevo esquema de trabajo que permite la selección automática de muestras de color, adaptado al rango cromático de la escena, que se ha denominado P-ASK, basado en el algoritmo de clasificación K-means. Los resultados obtenidos en esta tesis demuestran que el proceso metodológico de caracterización propuesto es altamente aplicable en tareas de documentación y preservación del patrimonio cultural en general, y en arte rupestre en particular. Se trata de una metodología de bajo coste, no invasiva, que permite obtener el registro colorimétrico de escenas completas. Una vez caracterizada, una cámara digital convencional puede emplearse para la determinación del color de forma rigurosa, simulando un colorímetro, lo que permitirá trabajar en un espacio de color de base física, independiente del dispositivo y comparable con
[CA] Les tasques de documentació gràfica d'art rupestre són àrdues i delicades, on el color compleix un paper fonamental, proporcionant informació vital a nivell descriptiu, t\`ecnic i quantitatiu.Tradicionalment els mètodes de documentació en arqueologia quedaven restringits a procediments estrictament subjectius, comportant limitacions pràctiques i tècniques, afectant els resultats obtinguts en la determinació de la color. L'ús combinat de tècniques geomàtiques, com la fotogrametria o el làser escàner, juntament amb tècniques de processament i realç d'imatges digitals, ha suposat un notable avanç. Tot i que les imatges digitals permeten capturar el color de forma ràpida, senzilla, i no invasiva, les dades RGB proporcionades per la càmera no tenen un sentit colorimètric rigorós. Es requereix l'aplicació d'un procés rigorós de transformació que permeti obtenir dades fidedignes de la color a través d'imatges digitals. En aquesta tesi es proposa una solució científica innovadora i d'avantguarda, en la qual es persegueix integrar l'anàlisi espectrofotomètric i colorimètric com a complement a tècniques fotogramètriques que permetin una millora en la identificació de la color i representació de pigments amb màxima fiabilitat en aixecaments, models i reconstruccions tridimensionals 3D. La metodologia proposada es basa en la caracterització colorimètrica de sensors digitals, que és de novell aplicació en pintures rupestres. La caracterització pretén obtenir les equacions de transformació entre les dades de color registrats per la càmera, dependents d'el dispositiu, i espais de color independents, de base física, com els establerts per la Commission Internationale de l'Éclairage (CIE). Per al tractament de dades colorimètriques i espectrals de forma rigorosa es requereix disposar d'un programari de característiques tècniques molt específiques. Encara que hi ha programari comercial, fan per separat el tractament digital d'imatges i les operacions colorimètriques. No hi ha programari que integri totes dues, ni que permeti dur a terme la caracterització. Com a aspecte addicional i fonamental, vam presentar el programari propi que s'ha desenvolupat, denominat pyColourimetry, segons les recomanacions publicades per la CIE, de codi obert, i adaptat al flux metodológic proposat, de manera que faciliti la independència i el progrés científic sense lligams comercials, permetent el tractament de dades colorimètriques i espectrals, i conferint a l'usuari ple control del procés i la gestió de les dades obtingudes. A més, s'exposa l'anàlisi dels principals factors que afecten la caracterització tals com el sensor emprat, els paràmetres de la càmera durant la presa, il¿luminació, el model de regressió, i el conjunt de dades emprades com a entrenament d'el model. S'ha aplicat un model de regressió basat en processos Gaussians, i s'han comparat els resultats obtinguts mitjançant polinomis. També vam presentar un nou esquema de treball que permet la selecció automàtica de mostres de color, adaptat a la franja cromàtica de l'escena, que s'ha anomenat P-ASK, basat en l'algoritme de classificació K-means. Els resultats obtinguts en aquesta tesi demostren que el procés metodològic de caracterització proposat és altament aplicable en tasques de documentació i preservació de el patrimoni cultural en general, i en art rupestre en particular. Es tracta d'una metodologia de baix cost, no invasiva, que permet obtenir el registre colorimètric d'escenes completes. Un cop caracteritzada, una càmera digital convencional pot emprar-se per a la determinació de la color de forma rigorosa, simulant un colorímetre, el que permetrà treballar en un espai de color de base física, independent d'el dispositiu i comparable amb dades obtingudes mitjançant altres càmeres que tambè estiguin caracteritzades.
[EN] Cultural heritage documentation and preservation is an arduous and delicate task in which color plays a fundamental role. The correct determination of color provides vital information on a descriptive, technical and quantitative level. Classical color documentation methods in archaeology were usually restricted to strictly subjective procedures. However, this methodology has practical and technical limitations, affecting the results obtained in the determination of color. Nowadays, it is frequent to support classical methods with geomatics techniques, such as photogrammetry or laser scanning, together with digital image processing. Although digital images allow color to be captured quickly, easily, and in a non-invasive way, the RGB data provided by the camera does not itself have a rigorous colorimetric sense. Therefore, a rigorous transformation process to obtain reliable color data from digital images is required. This thesis proposes a novel technical solution, in which the integration of spectrophotometric and colorimetric analysis is intended as a complement to photogrammetric techniques that allow an improvement in color identification and representation of pigments with maximum reliability in 3D surveys, models and reconstructions. The proposed methodology is based on the colorimetric characterization of digital sensors, which is of novel application in cave paintings. The characterization aims to obtain the transformation equations between the device-dependent color data recorded by the camera and the independent, physically-based color spaces, such as those established by the Commission Internationale de l'Éclairage (CIE). The rigorous processing of color and spectral data requires software packages with specific colorimetric functionalities. Although there are different commercial software options, they do not integrate the digital image processing and colorimetric computations together. And more importantly, they do not allow the camera characterization to be carried out. Therefore, as a key aspect in this thesis is our in-house pyColourimetry software that was developed and tested taking into account the recommendations published by the CIE. pyColourimetry is an open-source code, independent without commercial ties; it allows the treatment of colorimetric and spectral data and the digital image processing, and gives full control of the characterization process and the management of the obtained data to the user. On the other hand, this study presents a further analysis of the main factors affecting the characterization, such as the camera built-in sensor, the camera parameters, the illuminant, the regression model, and the data set used for model training. For computing the transformation equations, the literature recommends the use of polynomial equations as a regression model. Thus, polynomial models are considered as a starting point in this thesis. Additionally, a regression model based on Gaussian processes has been applied, and the results obtained by means of polynomials have been compared. Also, a new working scheme was reported which allows the automatic selection of color samples, adapted to the chromatic range of the scene. This scheme is called P-ASK, based on the K-means classification algorithm. The results achieved in this thesis show that the proposed framework for camera characterization is highly applicable in documentation and conservation tasks in general cultural heritage applications, and particularly in rock art painting. It is a low-cost and non-invasive methodology that allows for the colorimetric recording from complete image scenes. Once characterized, a conventional digital camera can be used for rigorous color determination, simulating a colorimeter. Thus, it is possible to work in a physical color space, independent of the device used, and comparable with data obtained from other cameras that are also characterized.
Thanks to the Universitat Politècnica de València for the FPI scholarship
Molada Tebar, A. (2020). Colorimetric and spectral analysis of rock art by means of the characterization of digital sensors [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/160386
TESIS
APA, Harvard, Vancouver, ISO, and other styles
19

Nguyen, Huong. "Near-optimal designs for Gaussian Process regression models." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1533983585774383.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Urry, Matthew. "Learning curves for Gaussian process regression on random graphs." Thesis, King's College London (University of London), 2013. https://kclpure.kcl.ac.uk/portal/en/theses/learning-curves-for-gaussian-process-regression-on-random-graphs(c1f5f395-0426-436c-989c-d0ade913423e).html.

Full text
Abstract:
Gaussian processes are a non-parametric method that can be used to learn both regression and classification rules from examples for arbitrary input spaces using the ’kernel trick’. They are well understood for inputs from Euclidean spaces, however, much less research has focused on other spaces. In this thesis I aim to at least partially resolve this. In particular I focus on the case where inputs are defined on the vertices of a graph and the task is to learn a function defined on the vertices from noisy examples, i.e. a regression problem. A challenging problem in the area of non-parametric learning is to predict the general-isation error as a function of the number of examples or learning curve. I show that, unlike in the Euclidean case where predictions are either quantitatively accurate for a few specific cases or only qualitatively accurate for a broader range of situations, I am able to derive accurate learning curves for Gaussian processes on graphs for a wide range of input spaces given by ensembles of random graphs. I focus on the random walk kernel but my results generalise to any kernel that can be written as a truncated sum of powers of the normalised graph Laplacian. I begin first with a discussion of the properties of the random walk kernel, which can be viewed as an approximation of the ubiquitous squared exponential kernel in continuous spaces. I show that compared to the squared exponential kernel, the random walk kernel has some surprising properties which includes a non-trivial limiting form for some types of graphs. After investigating the limiting form of the kernel I then study its use as a prior. I propose a solution to this in the form of a local normalisation, where the prior scale at each vertex is normalised locally as desired. To drive home the point about kernel normalisation I then examine the differences between the two kernels when they are used as a Gaussian process prior over functions defined on the vertices of a graph. I show using numerical simulations that the locally normalised kernel leads to a probabilistically more plausible Gaussian process prior. After investigating the properties of the random walk kernel I then discuss the learning curves of a Gaussian process with a random walk kernel for both kernel normalisations in a matched scenario (where student and teacher are both Gaussian processes with matching hyperparameters). I show that by using the cavity method I can derive accu-rate predictions along the whole length of the learning curve that dramatically improves upon previously derived approximations for continuous spaces suitably extended to the discrete graph case. The derivation of the learning curve for the locally normalised kernel required an addi-tional approximation in the resulting cavity equations. I subsequently, therefore, investi-gate this approximation in more detail using the replica method. I show that the locally normalised kernel leads to a highly non-trivial replica calculation, that eventually shows that the approximation used in the cavity analysis amounts to ignoring some consistency requirements between incoming cavity distributions. I focus in particular on a teacher distribution that is given by a Gaussian process with a random walk kernel but different hyperparameters. I show that in this case, by applying the cavity method, I am able once more to calculate accurate predictions of the learning curve. The resulting equations resemble the matched case over an inflated number of variables. To finish this thesis I examine the learning curves for varying degrees of model mismatch.
APA, Harvard, Vancouver, ISO, and other styles
21

Shah, Siddharth S. "Robust Heart Rate Variability Analysis using Gaussian Process Regression." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1293737259.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Marque-Pucheu, Sophie. "Gaussian process regression of two nested computer codes." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCC155/document.

Full text
Abstract:
Cette thèse traite de la métamodélisation (ou émulation) par processus gaussien de deux codes couplés. Le terme « deux codes couplés » désigne ici un système de deux codes chaînés : la sortie du premier code est une des entrées du second code. Les deux codes sont coûteux. Afin de réaliser une analyse de sensibilité de la sortie du code couplé, on cherche à construire un métamodèle de cette sortie à partir d'un faible nombre d'observations. Trois types d'observations du système existent : celles de la chaîne complète, celles du premier code uniquement, celles du second code uniquement.Le métamodèle obtenu doit être précis dans les zones les plus probables de l'espace d'entrée.Les métamodèles sont obtenus par krigeage universel, avec une approche bayésienne.Dans un premier temps, le cas sans information intermédiaire, avec sortie scalaire, est traité. Une méthode innovante de définition de la fonction de la moyenne du processus gaussien, basée sur le couplage de deux polynômes, est proposée. Ensuite le cas avec information intermédiaire est traité. Un prédicteur basé sur le couplage des prédicteurs gaussiens associés aux deux codes est proposé. Des méthodes pour évaluer rapidement la moyenne et la variance du prédicteur obtenu sont proposées. Les résultats obtenus pour le cas scalaire sont ensuite étendus au cas où les deux codes sont à sortie de grande dimension. Pour ce faire, une méthode de réduction de dimension efficace de la variable intermédiaire de grande dimension est proposée pour faciliter la régression par processus gaussien du deuxième code.Les méthodes proposées sont appliquées sur des exemples numériques
Three types of observations of the system exist: those of the chained code, those of the first code only and those of the second code only. The surrogate model has to be accurate on the most likely regions of the input domain of the nested code.In this work, the surrogate models are constructed using the Universal Kriging framework, with a Bayesian approach.First, the case when there is no information about the intermediary variable (the output of the first code) is addressed. An innovative parametrization of the mean function of the Gaussian process modeling the nested code is proposed. It is based on the coupling of two polynomials.Then, the case with intermediary observations is addressed. A stochastic predictor based on the coupling of the predictors associated with the two codes is proposed.Methods aiming at computing quickly the mean and the variance of this predictor are proposed. Finally, the methods obtained for the case of codes with scalar outputs are extended to the case of codes with high dimensional vectorial outputs.We propose an efficient dimension reduction method of the high dimensional vectorial input of the second code in order to facilitate the Gaussian process regression of this code. All the proposed methods are applied to numerical examples
APA, Harvard, Vancouver, ISO, and other styles
23

Alvarez, Mauricio A. "Convolved Gaussian process priors for multivariate regression with applications to dynamical systems." Thesis, University of Manchester, 2011. https://www.research.manchester.ac.uk/portal/en/theses/convolved-gaussian-process-priors-for-multivariate-regression-with-applications-to-dynamical-systems(0fe42df3-6dce-48ec-a74d-a6ecaf249d74).html.

Full text
Abstract:
In this thesis we address the problem of modeling correlated outputs using Gaussian process priors. Applications of modeling correlated outputs include the joint prediction of pollutant metals in geostatistics and multitask learning in machine learning. Defining a Gaussian process prior for correlated outputs translates into specifying a suitable covariance function that captures dependencies between the different output variables. Classical models for obtaining such a covariance function include the linear model of coregionalization and process convolutions. We propose a general framework for developing multiple output covariance functions by performing convolutions between smoothing kernels particular to each output and covariance functions that are common to all outputs. Both the linear model of coregionalization and the process convolutions turn out to be special cases of this framework. Practical aspects of the proposed methodology are studied in this thesis. They involve the use of domain-specific knowledge for defining relevant smoothing kernels, efficient approximations for reducing computational complexity and a novel method for establishing a general class of nonstationary covariances with applications in robotics and motion capture data.Reprints of the publications that appear at the end of this document, report case studies and experimental results in sensor networks, geostatistics and motion capture data that illustrate the performance of the different methods proposed.
APA, Harvard, Vancouver, ISO, and other styles
24

Kapat, Prasenjit. "Role of Majorization in Learning the Kernel within a Gaussian Process Regression Framework." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1316521301.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Maatouk, Hassan. "Correspondance entre régression par processus Gaussien et splines d'interpolation sous contraintes linéaires de type inégalité. Théorie et applications." Thesis, Saint-Etienne, EMSE, 2015. http://www.theses.fr/2015EMSE0791/document.

Full text
Abstract:
On s'intéresse au problème d'interpolation d'une fonction numérique d'une ou plusieurs variables réelles lorsque qu'elle est connue pour satisfaire certaines propriétés comme, par exemple, la positivité, monotonie ou convexité. Deux méthodes d'interpolation sont étudiées. D'une part, une approche déterministe conduit à un problème d'interpolation optimale sous contraintes linéaires inégalité dans un Espace de Hilbert à Noyau Reproduisant (RKHS). D'autre part, une approche probabiliste considère le même problème comme un problème d'estimation d'une fonction dans un cadre bayésien. Plus précisément, on considère la Régression par Processus Gaussien ou Krigeage pour estimer la fonction à interpoler sous les contraintes linéaires de type inégalité en question. Cette deuxième approche permet également de construire des intervalles de confiance autour de la fonction estimée. Pour cela, on propose une méthode d'approximation qui consiste à approcher un processus gaussien quelconque par un processus gaussien fini-dimensionnel. Le problème de krigeage se ramène ainsi à la simulation d'un vecteur gaussien tronqué à un espace convexe. L'analyse asymptotique permet d'établir la convergence de la méthode et la correspondance entre les deux approches déterministeet probabiliste, c'est le résultat théorique de la thèse. Ce dernier est vu comme unegénéralisation de la correspondance établie par [Kimeldorf and Wahba, 1971] entre estimateur bayésien et spline d'interpolation. Enfin, une application réelle dans le domainede l'assurance (actuariat) pour estimer une courbe d'actualisation et des probabilités dedéfaut a été développée
This thesis is dedicated to interpolation problems when the numerical function is known to satisfy some properties such as positivity, monotonicity or convexity. Two methods of interpolation are studied. The first one is deterministic and is based on convex optimization in a Reproducing Kernel Hilbert Space (RKHS). The second one is a Bayesian approach based on Gaussian Process Regression (GPR) or Kriging. By using a finite linear functional decomposition, we propose to approximate the original Gaussian process by a finite-dimensional Gaussian process such that conditional simulations satisfy all the inequality constraints. As a consequence, GPR is equivalent to the simulation of a truncated Gaussian vector to a convex set. The mode or Maximum A Posteriori is defined as a Bayesian estimator and prediction intervals are quantified by simulation. Convergence of the method is proved and the correspondence between the two methods is done. This can be seen as an extension of the correspondence established by [Kimeldorf and Wahba, 1971] between Bayesian estimation on stochastic process and smoothing by splines. Finally, a real application in insurance and finance is given to estimate a term-structure curve and default probabilities
APA, Harvard, Vancouver, ISO, and other styles
26

Barrett, James Edward. "Gaussian process regression models for the analysis of survival data with competing risks, interval censoring and high dimensionality." Thesis, King's College London (University of London), 2015. http://kclpure.kcl.ac.uk/portal/en/theses/gaussian-process-regression-models-for-the-analysis-of-survival-data-with-competing-risks-interval-censoring-and-high-dimensionality(fe3440e1-9766-4fc3-9d23-fe4af89483b5).html.

Full text
Abstract:
We develop novel statistical methods for analysing biomedical survival data based on Gaussian process (GP) regression. GP regression provides a powerful non-parametric probabilistic method of relating inputs to outputs. We apply this to survival data which consist of time-to-event and covariate measurements. In the context of GP regression the covariates are regarded as `inputs' and the event times are the `outputs'. This allows for highly exible inference of non-linear relationships between covariates and event times. Many existing methods for analysing survival data, such as the ubiquitous Cox proportional hazards model, focus primarily on the hazard rate which is typically assumed to take some parametric or semi-parametric form. Our proposed model belongs to the class of accelerated failure time models and as such our focus is on directly characterising the relationship between the covariates and event times without any explicit assumptions on what form the hazard rates take. This provides a more direct route to connecting the covariates to survival outcomes with minimal assumptions. An application of our model to experimental data illustrates its usefulness. We then apply multiple output GP regression, which can handle multiple potentially correlated outputs for each input, to competing risks survival data where multiple event types can occur. In this case the multiple outputs correspond to the time-to-event for each risk. By tuning one of the model parameters we can control the extent to which the multiple outputs are dependent thus allowing the specication of correlated risks. However, the identiability problem, which states that it is not possible to infer whether risks are truly independent or otherwise on the basis of observed data, still holds. In spite of this fundamental limitation simulation studies suggest that in some cases assuming dependence can lead to more accurate predictions. The second part of this thesis is concerned with high dimensional survival data where there are a large number of covariates compared to relatively few individuals. This leads to the problem of overtting, where spurious relationships are inferred from the data. One strategy to tackle this problem is dimensionality reduction. The Gaussian process latent variable model (GPLVM) is a powerful method of extracting a low dimensional representation of high dimensional data. We extend the GPLVM to incorporate survival outcomes by combining the model with a Weibull proportional hazards model (WPHM). By reducing the ratio of covariates to samples we hope to diminish the eects of overtting. The combined GPLVM-WPHM model can also be used to combine several datasets by simultaneously expressing them in terms of the same low dimensional latent variables. We construct the Laplace approximation of the marginal likelihood and use this to determine the optimal number of latent variables, thereby allowing detection of intrinsic low dimensional structure. Results from both simulated and real data show a reduction in overtting and an increase in predictive accuracy after dimensionality reduction.
APA, Harvard, Vancouver, ISO, and other styles
27

Liang, Ke. "Oculométrie Numérique Economique : modèle d'apparence et apprentissage par variétés." Thesis, Paris, EPHE, 2015. http://www.theses.fr/2015EPHE3020/document.

Full text
Abstract:
L'oculométrie est un ensemble de techniques dédié à enregistrer et analyser les mouvements oculaires. Dans cette thèse, je présente l'étude, la conception et la mise en œuvre d'un système oculométrique numérique, non-intrusif permettant d'analyser les mouvements oculaires en temps réel avec une webcam à distance et sans lumière infra-rouge. Dans le cadre de la réalisation, le système oculométrique proposé se compose de quatre modules: l'extraction des caractéristiques, la détection et le suivi des yeux, l'analyse de la variété des mouvements des yeux à partir des images et l'estimation du regard par l'apprentissage. Nos contributions reposent sur le développement des méthodes autour de ces quatre modules: la première réalise une méthode hybride pour détecter et suivre les yeux en temps réel à partir des techniques du filtre particulaire, du modèle à formes actives et des cartes des yeux (EyeMap); la seconde réalise l'extraction des caractéristiques à partir de l'image des yeux en utilisant les techniques des motifs binaires locaux; la troisième méthode classifie les mouvements oculaires selon la variété générée par le Laplacian Eigenmaps et forme un ensemble de données d'apprentissage; enfin, la quatrième méthode calcul la position du regard à partir de cet ensemble d'apprentissage. Nous proposons également deux méthodes d'estimation:une méthode de la régression par le processus gaussien et un apprentissage semi-supervisé et une méthode de la catégorisation par la classification spectrale (spectral clustering). Il en résulte un système complet, générique et économique pour les applications diverses dans le domaine de l'oculométrie
Gaze tracker offers a powerful tool for diverse study fields, in particular eye movement analysis. In this thesis, we present a new appearance-based real-time gaze tracking system with only a remote webcam and without infra-red illumination. Our proposed gaze tracking model has four components: eye localization, eye feature extraction, eye manifold learning and gaze estimation. Our research focuses on the development of methods on each component of the system. Firstly, we propose a hybrid method to localize in real time the eye region in the frames captured by the webcam. The eye can be detected by Active Shape Model and EyeMap in the first frame where eye occurs. Then the eye can be tracked through a stochastic method, particle filter. Secondly, we employ the Center-Symmetric Local Binary Patterns for the detected eye region, which has been divided into blocs, in order to get the eye features. Thirdly, we introduce manifold learning technique, such as Laplacian Eigen-maps, to learn different eye movements by a set of eye images collected. This unsupervised learning helps to construct an automatic and correct calibration phase. In the end, as for the gaze estimation, we propose two models: a semi-supervised Gaussian Process Regression prediction model to estimate the coordinates of eye direction; and a prediction model by spectral clustering to classify different eye movements. Our system with 5-points calibration can not only reduce the run-time cost, but also estimate the gaze accurately. Our experimental results show that our gaze tracking model has less constraints from the hardware settings and it can be applied efficiently in different real-time applications
APA, Harvard, Vancouver, ISO, and other styles
28

Ploé, Patrick. "Surrogate-based optimization of hydrofoil shapes using RANS simulations." Thesis, Ecole centrale de Nantes, 2018. http://www.theses.fr/2018ECDN0012/document.

Full text
Abstract:
Cette thèse présente un framework d’optimisation pour la conception hydrodynamique de forme d’hydrofoils. L’optimisation d’hydrofoil par simulation implique des objectifs d’optimisation divergents et impose des compromis contraignants en raison du coût des simulations numériques et des budgets limités généralement alloués à la conception des navires. Le framework fait appel à l’échantillonnage séquentiel et aux modèles de substitution. Un modèle prédictif est construit en utilisant la Régression par Processus Gaussien (RPG) à partir des données issues de simulations fluides effectuées sur différentes géométries d’hydrofoils. Le modèle est ensuite combiné à d’autres critères dans une fonction d’acquisition qui est évaluée sur l’espace de conception afin de définir une nouvelle géométrie qui est testée et dont les paramètres et la réponse sont ajoutés au jeu de données, améliorant ainsi le modèle. Une nouvelle fonction d’acquisition a été développée, basée sur la variance RPG et la validation croisée des données. Un modeleur géométrique a également été développé afin de créer automatiquement les géométries d’hydrofoil a partir des paramètres déterminés par l’optimiseur. Pour compléter la boucle d’optimisation,FINE/Marine, un solveur fluide RANS, a été intégré dans le framework pour exécuter les simulations fluides. Les capacités d’optimisation ont été testées sur des cas tests analytiques montrant que la nouvelle fonction d’acquisition offre plus de robustesse que d’autres fonctions d’acquisition existantes. L’ensemble du framework a ensuite été testé sur des optimisations de sections 2Dd’hydrofoil ainsi que d’hydrofoil 3D avec surface libre. Dans les deux cas, le processus d’optimisation fonctionne, permettant d’optimiser les géométries d’hydrofoils et confirmant les performances obtenues sur les cas test analytiques. Les optima semblent cependant être assez sensibles aux conditions opérationnelles
This thesis presents a practical hydrodynamic optimization framework for hydrofoil shape design. Automated simulation based optimization of hydrofoil is a challenging process. It may involve conflicting optimization objectives, but also impose a trade-off between the cost of numerical simulations and the limited budgets available for ship design. The optimization frameworkis based on sequential sampling and surrogate modeling. Gaussian Process Regression (GPR) is used to build a predictive model based on data issued from fluid simulations of selected hydrofoil geometries. The GPR model is then combined with other criteria into an acquisition function that isevaluated over the design space, to define new querypoints that are added to the data set in order to improve the model. A custom acquisition function is developed, based on GPR variance and cross validation of the data.A hydrofoil geometric modeler is also developed to automatically create the hydrofoil shapes based on the parameters determined by the optimizer. To complete the optimization loop, FINE/Marine, a RANS flow solver, is embedded into the framework to perform the fluid simulations. Optimization capabilities are tested on analytical test cases. The results show that the custom function is more robust than other existing acquisition functions when tested on difficult functions. The entire optimization framework is then tested on 2D hydrofoil sections and 3D hydrofoil optimization cases with free surface. In both cases, the optimization process performs well, resulting in optimized hydrofoil shapes and confirming the results obtained from the analytical test cases. However, the optimum is shown to be sensitive to operating conditions
APA, Harvard, Vancouver, ISO, and other styles
29

Dubourg, Vincent. "Méta-modèles adaptatifs pour l'analyse de fiabilité et l'optimisation sous contrainte fiabiliste." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2011. http://tel.archives-ouvertes.fr/tel-00697026.

Full text
Abstract:
Cette thèse est une contribution à la résolution du problème d'optimisation sous contrainte de fiabilité. Cette méthode de dimensionnement probabiliste vise à prendre en compte les incertitudes inhérentes au système à concevoir, en vue de proposer des solutions optimales et sûres. Le niveau de sûreté est quantifié par une probabilité de défaillance. Le problème d'optimisation consiste alors à s'assurer que cette probabilité reste inférieure à un seuil fixé par les donneurs d'ordres. La résolution de ce problème nécessite un grand nombre d'appels à la fonction d'état-limite caractérisant le problème de fiabilité sous-jacent. Ainsi,cette méthodologie devient complexe à appliquer dès lors que le dimensionnement s'appuie sur un modèle numérique coûteux à évaluer (e.g. un modèle aux éléments finis). Dans ce contexte, ce manuscrit propose une stratégie basée sur la substitution adaptative de la fonction d'état-limite par un méta-modèle par Krigeage. On s'est particulièrement employé à quantifier, réduire et finalement éliminer l'erreur commise par l'utilisation de ce méta-modèle en lieu et place du modèle original. La méthodologie proposée est appliquée au dimensionnement des coques géométriquement imparfaites soumises au flambement.
APA, Harvard, Vancouver, ISO, and other styles
30

Xie, Guangrui. "Robust and Data-Efficient Metamodel-Based Approaches for Online Analysis of Time-Dependent Systems." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/98806.

Full text
Abstract:
Metamodeling is regarded as a powerful analysis tool to learn the input-output relationship of a system based on a limited amount of data collected when experiments with real systems are costly or impractical. As a popular metamodeling method, Gaussian process regression (GPR), has been successfully applied to analyses of various engineering systems. However, GPR-based metamodeling for time-dependent systems (TDSs) is especially challenging due to three reasons. First, TDSs require an appropriate account for temporal effects, however, standard GPR cannot address temporal effects easily and satisfactorily. Second, TDSs typically require analytics tools with a sufficiently high computational efficiency to support online decision making, but standard GPR may not be adequate for real-time implementation. Lastly, reliable uncertainty quantification is a key to success for operational planning of TDSs in real world, however, research on how to construct adequate error bounds for GPR-based metamodeling is sparse. Inspired by the challenges encountered in GPR-based analyses of two representative stochastic TDSs, i.e., load forecasting in a power system and trajectory prediction for unmanned aerial vehicles (UAVs), this dissertation aims to develop novel modeling, sampling, and statistical analysis techniques for enhancing the computational and statistical efficiencies of GPR-based metamodeling to meet the requirements of practical implementations. Furthermore, an in-depth investigation on building uniform error bounds for stochastic kriging is conducted, which sets up a foundation for developing robust GPR-based metamodeling techniques for analyses of TDSs under the impact of strong heteroscedasticity.
Ph.D.
Metamodeling has been regarded as a powerful analysis tool to learn the input-output relationship of an engineering system with a limited amount of experimental data available. As a popular metamodeling method, Gaussian process regression (GPR) has been widely applied to analyses of various engineering systems whose input-output relationships do not depend on time. However, GPR-based metamodeling for time-dependent systems (TDSs), whose input-output relationships depend on time, is especially challenging due to three reasons. First, standard GPR cannot properly address temporal effects for TDSs. Second, standard GPR is typically not computationally efficient enough for real-time implementations in TDSs. Lastly, research on how to adequately quantify the uncertainty associated with the performance of GPR-based metamodeling is sparse. To fill this knowledge gap, this dissertation aims to develop novel modeling, sampling, and statistical analysis techniques for enhancing standard GPR to meet the requirements of practical implementations for TDSs. Effective solutions are provided to address the challenges encountered in GPR-based analyses of two representative stochastic TDSs, i.e., load forecasting in a power system and trajectory prediction for unmanned aerial vehicles (UAVs). Furthermore, an in-depth investigation on quantifying the uncertainty associated with the performance of stochastic kriging (a variant of standard GPR) is conducted, which sets up a foundation for developing robust GPR-based metamodeling techniques for analyses of more complex TDSs.
APA, Harvard, Vancouver, ISO, and other styles
31

Vives, Maria Carola Alfaro. "Modelo de Gauss-Markov de regressão : adequação de normalidade e inferencia na escala original, apos transformação." [s.n.], 1994. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306852.

Full text
Abstract:
Orientador: Clarice Azevedo de Luna Freire
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Ciencia da Computação
Made available in DSpace on 2018-07-20T08:05:15Z (GMT). No. of bitstreams: 1 Vives_MariaCarolaAlfaro_M.pdf: 2360924 bytes, checksum: 7a09c2138cc3e2e7899e8ad23c36a5a4 (MD5) Previous issue date: 1994
Resumo: Não informado.
Abstract: Not informed.
Mestrado
Mestre em Estatística
APA, Harvard, Vancouver, ISO, and other styles
32

Erich, Roger Alan. "Regression Modeling of Time to Event Data Using the Ornstein-Uhlenbeck Process." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1342796812.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

De, lozzo Matthias. "Modèles de substitution spatio-temporels et multifidélité : Application à l'ingénierie thermique." Thesis, Toulouse, INSA, 2013. http://www.theses.fr/2013ISAT0027/document.

Full text
Abstract:
Cette thèse porte sur la construction de modèles de substitution en régimes transitoire et permanent pour la simulation thermique, en présence de peu d'observations et de plusieurs sorties.Nous proposons dans un premier temps une construction robuste de perceptron multicouche bouclé afin d'approcher une dynamique spatio-temporelle. Ce modèle de substitution s'obtient par une moyennisation de réseaux de neurones issus d'une procédure de validation croisée, dont le partitionnement des observations associé permet d'ajuster les paramètres de chacun de ces modèles sur une base de test sans perte d'information. De plus, la construction d'un tel perceptron bouclé peut être distribuée selon ses sorties. Cette construction est appliquée à la modélisation de l'évolution temporelle de la température en différents points d'une armoire aéronautique.Nous proposons dans un deuxième temps une agrégation de modèles par processus gaussien dans un cadre multifidélité où nous disposons d'un modèle d'observation haute-fidélité complété par plusieurs modèles d'observation de fidélités moindres et non comparables. Une attention particulière est portée sur la spécification des tendances et coefficients d'ajustement présents dans ces modèles. Les différents krigeages et co-krigeages sont assemblés selon une partition ou un mélange pondéré en se basant sur une mesure de robustesse aux points du plan d'expériences les plus fiables. Cette approche est employée pour modéliser la température en différents points de l'armoire en régime permanent.Nous proposons dans un dernier temps un critère pénalisé pour le problème de la régression hétéroscédastique. Cet outil est développé dans le cadre des estimateurs par projection et appliqué au cas particulier des ondelettes de Haar. Nous accompagnons ces résultats théoriques de résultats numériques pour un problème tenant compte de différentes spécifications du bruit et de possibles dépendances dans les observations
This PhD thesis deals with the construction of surrogate models in transient and steady states in the context of thermal simulation, with a few observations and many outputs.First, we design a robust construction of recurrent multilayer perceptron so as to approach a spatio-temporal dynamic. We use an average of neural networks resulting from a cross-validation procedure, whose associated data splitting allows to adjust the parameters of these models thanks to a test set without any information loss. Moreover, the construction of this perceptron can be distributed according to its outputs. This construction is applied to the modelling of the temporal evolution of the temperature at different points of an aeronautical equipment.Then, we proposed a mixture of Gaussian process models in a multifidelity framework where we have a high-fidelity observation model completed by many observation models with lower and no comparable fidelities. A particular attention is paid to the specifications of trends and adjustement coefficients present in these models. Different kriging and co-krigings models are put together according to a partition or a weighted aggregation based on a robustness measure associated to the most reliable design points. This approach is used in order to model the temperature at different points of the equipment in steady state.Finally, we propose a penalized criterion for the problem of heteroscedastic regression. This tool is build in the case of projection estimators and applied with the Haar wavelet. We also give some numerical results for different noise specifications and possible dependencies in the observations
APA, Harvard, Vancouver, ISO, and other styles
34

Tong, Xiao Thomas. "Statistical Learning of Some Complex Systems: From Dynamic Systems to Market Microstructure." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:10917.

Full text
Abstract:
A complex system is one with many parts, whose behaviors are strongly dependent on each other. There are two interesting questions about complex systems. One is to understand how to recover the true structure of a complex system from noisy data. The other is to understand how the system interacts with its environment. In this thesis, we address these two questions by studying two distinct complex systems: dynamic systems and market microstructure. To address the first question, we focus on some nonlinear dynamic systems. We develop a novel Bayesian statistical method, Gaussian Emulator, to estimate the parameters of dynamic systems from noisy data, when the data are either fully or partially observed. Our method shows that estimation accuracy is substantially improved and computation is faster, compared to the numerical solvers. To address the second question, we focus on the market microstructure of hidden liquidity. We propose some statistical models to explain the hidden liquidity under different market conditions. Our statistical results suggest that hidden liquidity can be reliably predicted given the visible state of the market.
Statistics
APA, Harvard, Vancouver, ISO, and other styles
35

Cardamone, Salvatore. "An interacting quantum atoms approach to constructing a conformationally dependent biomolecular force field by Gaussian process regression : potential energy surface sampling and validation." Thesis, University of Manchester, 2017. https://www.research.manchester.ac.uk/portal/en/theses/an-interacting-quantum-atoms-approach-to-constructing-a-conformationally-dependent-biomolecular-force-field-by-gaussian-process-regression-potential-energy-surface-sampling-and-validation(508ed450-9033-4bc9-8522-690d5a7909eb).html.

Full text
Abstract:
The energetics of chemical systems are quantum mechanical in origin and dependent upon the internal molecular conformational degrees of freedom. "Classical force field" strategies are inadequate approximations to these energetics owing to a plethora of simplifications- both conceptual and mathematical. These simplifications have been employed to make the in silico modelling of molecular systems computationally tractable, but are also subject to both qualitative and quantitative errors. In spite of these shortcomings, classical force fields have become entrenched as a cornerstone of computational chemistry. The Quantum Chemical Topological Force Field (QCTFF) has been a central re search theme within our group for a number of years, and has been designed to ameliorate the shortcomings of classical force fields. Within its framework, one can undertake a full spatial decomposition of a chemical system into a set of finite atoms. Atomic properties are subsequently obtained by a rigorous quantum mechanical treatment of the resultant atomic domains through the theory of Interacting Quantum Atoms (IQA). Conformational dependence is accounted for in theQCTFF by use of Gaussian Process Regression, a machine learning technique. In so doing, one constructs an analytical function to provide a mapping from a molecular conformation to a set of atomic energetic quantities. One can subsequently conduct dynamics with these energetic quantities. The notion of "conformational sampling" is shown to be of key importance to the proper construction of the QCTFF. Conformational sampling is a key theme in this work, and a subject that we will expatiate. We suggest a novel conformational sampling scheme, and attempt a number of conformer subset selection strategies to construct optimal machine learning models. The QCTFF is then applied to carbohydrates for the first time, and shown to produce results well within the commonly invoked threshold of "chemical accuracy"- O(β^{-1}), where β is the thermodynamic beta. Finally, we present a number of methodological developments to aid in both the accuracy and tractability of predicting ab initio vibrational spectroscopies.
APA, Harvard, Vancouver, ISO, and other styles
36

Froicu, Dragos Vasile. "Modeling and learning-based calibration of residual current devices." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
Abstract:
An important step in the manufacturing process of residual current devices consists of their calibration. The latter is a time-consuming procedure necessary for the proper operation of these devices. The main goal of this document is to propose a solution to increase the efficiency of the calibration workstations, by reducing the overall time of this manufacturing step. To successfully achieve this goal, a much more accurate model is needed compared to the one being used nowadays. The system under study is dominated by a huge amount of uncertainty. This is due to the high number of parameters involved, each of them characterized by very large tolerances. In the approach being used here, the governing physical equations have been integrated with a Bayesian learning process. By doing that, the benefits concerning the knowledge of the underlying physics are combined with the uncertainty estimation provided by the stochastic model resulting in a more robust and accurate model. The Bayesian learning method used in this study case regards Gaussian Process modeling, starting from a physically-based prior and updating it as data is being observed. This is repeated for every device that needs to be calibrated. The result is an adaptive modeling procedure that can be easily implemented and used directly in the manufacturing process to achieve a faster calibration and decrease the overall process time. The estimated improvement of the proposed solution compared to the one being used nowadays is about 24% fewer calibration attempts on average. The encouraging results that have been obtained in the simulation have prompted implementing and testing it on the real process.
APA, Harvard, Vancouver, ISO, and other styles
37

Turner, Jacob E. "Improving the Sensitivity of a Pulsar Timing Array: Correcting for Interstellar Scattering Delays." Oberlin College Honors Theses / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=oberlin1495573098864359.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Edwards, Adam Michael. "Precision Aggregated Local Models." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/102125.

Full text
Abstract:
Large scale Gaussian process (GP) regression is infeasible for larger data sets due to cubic scaling of flops and quadratic storage involved in working with covariance matrices. Remedies in recent literature focus on divide-and-conquer, e.g., partitioning into sub-problems and inducing functional (and thus computational) independence. Such approximations can speedy, accurate, and sometimes even more flexible than an ordinary GPs. However, a big downside is loss of continuity at partition boundaries. Modern methods like local approximate GPs (LAGPs) imply effectively infinite partitioning and are thus pathologically good and bad in this regard. Model averaging, an alternative to divide-and-conquer, can maintain absolute continuity but often over-smooth, diminishing accuracy. Here I propose putting LAGP-like methods into a local experts-like framework, blending partition-based speed with model-averaging continuity, as a flagship example of what I call precision aggregated local models (PALM). Using N_C LAGPs, each selecting n from N data pairs, I illustrate a scheme that is at most cubic in n, quadratic in N_C, and linear in N, drastically reducing computational and storage demands. Extensive empirical illustration shows how PALM is at least as accurate as LAGP, can be much faster in terms of speed, and furnishes continuous predictive surfaces. Finally, I propose sequential updating scheme which greedily refines a PALM predictor up to a computational budget, and several variations on the basic PALM that may provide predictive improvements.
Doctor of Philosophy
Occasionally, when describing the relationship between two variables, it may be helpful to use a so-called ``non-parametric" regression that is agnostic to the function that connects them. Gaussian Processes (GPs) are a popular method of non-parametric regression used for their relative flexibility and interpretability, but they have the unfortunate drawback of being computationally infeasible for large data sets. Past work into solving the scaling issues for GPs has focused on ``divide and conquer" style schemes that spread the data out across multiple smaller GP models. While these model make GP methods much more accessible to large data sets they do so either at the expense of local predictive accuracy of global surface continuity. Precision Aggregated Local Models (PALM) is a novel divide and conquer method for GP models that is scalable for large data while maintaining local accuracy and a smooth global model. I demonstrate that PALM can be built quickly, and performs well predictively compared to other state of the art methods. This document also provides a sequential algorithm for selecting the location of each local model, and variations on the basic PALM methodology.
APA, Harvard, Vancouver, ISO, and other styles
39

Ptáček, Martin. "Spatial Function Estimation with Uncertain Sensor Locations." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2021. http://www.nusl.cz/ntk/nusl-449288.

Full text
Abstract:
Tato práce se zabývá úlohou odhadování prostorové funkce z hlediska regrese pomocí Gaussovských procesů (GPR) za současné nejistoty tréninkových pozic (pozic senzorů). Nejdříve je zde popsána teorie v pozadí GPR metody pracující se známými tréninkovými pozicemi. Tato teorie je poté aplikována při odvození výrazů prediktivní distribuce GPR v testovací pozici při uvážení nejistoty tréninkových pozic. Kvůli absenci analytického řešení těchto výrazů byly výrazy aproximovány pomocí metody Monte Carlo. U odvozené metody bylo demonstrováno zlepšení kvality odhadu prostorové funkce oproti standardnímu použití GPR metody a také oproti zjednodušenému řešení uvedenému v literatuře. Dále se práce zabývá možností použití metody GPR s nejistými tréninkovými pozicemi v~kombinaci s výrazy s dostupným analytickým řešením. Ukazuje se, že k dosažení těchto výrazů je třeba zavést značné předpoklady, což má od počátku za následek nepřesnost prediktivní distribuce. Také se ukazuje, že výsledná metoda používá standardní výrazy GPR v~kombinaci s upravenou kovarianční funkcí. Simulace dokazují, že tato metoda produkuje velmi podobné odhady jako základní GPR metoda uvažující známé tréninkové pozice. Na druhou stranu prediktivní variance (nejistota odhadu) je u této metody zvýšena, což je žádaný efekt uvážení nejistoty tréninkových pozic.
APA, Harvard, Vancouver, ISO, and other styles
40

Xu, Li. "Statistical Methods for Variability Management in High-Performance Computing." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/104184.

Full text
Abstract:
High-performance computing (HPC) variability management is an important topic in computer science. Research topics include experimental designs for efficient data collection, surrogate models for predicting the performance variability, and system configuration optimization. Due to the complex architecture of HPC systems, a comprehensive study of HPC variability needs large-scale datasets, and experimental design techniques are useful for improved data collection. Surrogate models are essential to understand the variability as a function of system parameters, which can be obtained by mathematical and statistical models. After predicting the variability, optimization tools are needed for future system designs. This dissertation focuses on HPC input/output (I/O) variability through three main chapters. After the general introduction in Chapter 1, Chapter 2 focuses on the prediction models for the scalar description of I/O variability. A comprehensive comparison study is conducted, and major surrogate models for computer experiments are investigated. In addition, a tool is developed for system configuration optimization based on the chosen surrogate model. Chapter 3 conducts a detailed study for the multimodal phenomena in I/O throughput distribution and proposes an uncertainty estimation method for the optimal number of runs for future experiments. Mixture models are used to identify the number of modes for throughput distributions at different configurations. This chapter also addresses the uncertainty in parameter estimation and derives a formula for sample size calculation. The developed method is then applied to HPC variability data. Chapter 4 focuses on the prediction of functional outcomes with both qualitative and quantitative factors. Instead of a scalar description of I/O variability, the distribution of I/O throughput provides a comprehensive description of I/O variability. We develop a modified Gaussian process for functional prediction and apply the developed method to the large-scale HPC I/O variability data. Chapter 5 contains some general conclusions and areas for future work.
Doctor of Philosophy
This dissertation focuses on three projects that are all related to statistical methods in performance variability management in high-performance computing (HPC). HPC systems are computer systems that create high performance by aggregating a large number of computing units. The performance of HPC is measured by the throughput of a benchmark called the IOZone Filesystem Benchmark. The performance variability is the variation among throughputs when the system configuration is fixed. Variability management involves studying the relationship between performance variability and the system configuration. In Chapter 2, we use several existing prediction models to predict the standard deviation of throughputs given different system configurations and compare the accuracy of predictions. We also conduct HPC system optimization using the chosen prediction model as the objective function. In Chapter 3, we use the mixture model to determine the number of modes in the distribution of throughput under different system configurations. In addition, we develop a model to determine the number of additional runs for future benchmark experiments. In Chapter 4, we develop a statistical model that can predict the throughout distributions given the system configurations. We also compare the prediction of summary statistics of the throughput distributions with existing prediction models.
APA, Harvard, Vancouver, ISO, and other styles
41

Kamrath, Matthew. "Extending standard outdoor noise propagation models to complex geometries." Thesis, Le Mans, 2017. http://www.theses.fr/2017LEMA1038/document.

Full text
Abstract:
Les méthodes d'ingénierie acoustique (e.g. ISO 9613-2 ou CNOSSOS-EU) approchent efficacement les niveaux de bruit générés par les routes, les voies ferrées et les sources industrielles en milieu urbain. Cependant, ces approches d'ingénierie sont limitées à des géométries de forme simple, le plus souvent de section rectangulaire. Ce mémoire développe donc, et valide, une approche hybride permettant l'extension des méthodes d'ingénierie à des formes plus complexes, en introduisant un terme d’atténuation supplémentaire qui représente l'effet d'un objet réel comparé à un objet simple.Le calcul de cette atténuation supplémentaire nécessite des calculs de référence, permettant de quantifier la différence entre objets simple et complexe. Dans la mesure où il est trop onéreux, numériquement, '’effectuer ce calcul pour tous les chemins de propagation, l'atténuation supplémentaire est obtenue par interpolation de données stockées dans un tableau et évaluées pour un large jeu de positions de sources, de récepteurs et de fréquences. Dans notre approche, le calcul de référence utilise la méthode BEM en 2.5D, et permet ainsi de produire les niveaux de référence pour les géométries simple et complexe, tout en tabulant leur écart. Sur le principe, d'autres approches de référence pourraient être utilisées.Ce travail valide cette approche hybride pour un écran en forme de T avec un sol rigide, un sol absorbant et un cas avec bâtiments. Ces trois cas démontrent que l'approche hybride est plus précise que l'approche d’ingénierie standard dans des cas complexes
Noise engineering methods (e.g. ISO 9613-2 or CNOSSOS-EU) efficiently approximate sound levels from roads, railways, and industrial sources in cities. However, engineering methods are limited to only simple box-shaped geometries. This dissertation develops and validates a hybrid method to extend the engineering methods to more complicated geometries by introducing an extra attenuation term that represents the influence of a real object compared to a simplified object.Calculating the extra attenuation term requires reference calculations to quantify the difference between the complex and simplified objects. Since performing a reference computation for each path is too computationally expensive, the extra attenuation term is linearly interpolated from a data table containing the corrections for many source and receiver positions and frequencies. The 2.5D boundary element method produces the levels for the real complex geometry and a simplified geometry, and subtracting these levels yields the corrections in the table.This dissertation validates this hybrid method for a T-barrier with hard ground, soft ground, and buildings. All three cases demonstrate that the hybrid method is more accurate than standard engineering methods for complex cases
APA, Harvard, Vancouver, ISO, and other styles
42

Guss, Herman, and Linus Rustas. "Applying Machine Learning Algorithms for Anomaly Detection in Electricity Data : Improving the Energy Efficiency of Residential Buildings." Thesis, Uppsala universitet, Byggteknik och byggd miljö, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-415507.

Full text
Abstract:
The purpose of this thesis is to investigate how data from a residential property owner can be utilized to enable better energy management for their building stock. Specifically, this is done through the development of two machine learning models with the objective of detecting anomalies in the existing data of electricity consumption. The dataset consists of two years of residential electricity consumption for 193 substations belonging to the residential property owner Uppsalahem. The first of the developed models uses the K-means method to cluster substations with similar consumption patterns to create electricity profiles, while the second model uses Gaussian process regression to predict electricity consumption of a 24 hour timeframe. The performance of these models is evaluated and the optimal models resulting from this process are implemented to detect anomalies in the electricity consumption data. Two different algorithms for anomaly detection are presented, based on the differing properties of the two earlier models. During the evaluation of the models, it is established that the consumption patterns of the substations display a high variability, making it difficult to accurately model the full dataset. Both models are shown to be able to detect anomalies in the electricity consumption data, but the K-means based anomaly detection model is preferred due to it being faster and more reliable. It is concluded that substation electricity consumption is not ideal for anomaly detection, and that if a model should be implemented, it should likely exclude some of the substations with less regular consumption profiles.
APA, Harvard, Vancouver, ISO, and other styles
43

Balmand, Samuel. "Quelques contributions à l'estimation de grandes matrices de précision." Thesis, Paris Est, 2016. http://www.theses.fr/2016PESC1024/document.

Full text
Abstract:
Sous l'hypothèse gaussienne, la relation entre indépendance conditionnelle et parcimonie permet de justifier la construction d'estimateurs de l'inverse de la matrice de covariance -- également appelée matrice de précision -- à partir d'approches régularisées. Cette thèse, motivée à l'origine par la problématique de classification d'images, vise à développer une méthode d'estimation de la matrice de précision en grande dimension, lorsque le nombre $n$ d'observations est petit devant la dimension $p$ du modèle. Notre approche repose essentiellement sur les liens qu'entretiennent la matrice de précision et le modèle de régression linéaire. Elle consiste à estimer la matrice de précision en deux temps. Les éléments non diagonaux sont tout d'abord estimés en considérant $p$ problèmes de minimisation du type racine carrée des moindres carrés pénalisés par la norme $ell_1$.Les éléments diagonaux sont ensuite obtenus à partir du résultat de l'étape précédente, par analyse résiduelle ou maximum de vraisemblance. Nous comparons ces différents estimateurs des termes diagonaux en fonction de leur risque d'estimation. De plus, nous proposons un nouvel estimateur, conçu de sorte à tenir compte de la possible contamination des données par des {em outliers}, grâce à l'ajout d'un terme de régularisation en norme mixte $ell_2/ell_1$. L'analyse non-asymptotique de la convergence de notre estimateur souligne la pertinence de notre méthode
Under the Gaussian assumption, the relationship between conditional independence and sparsity allows to justify the construction of estimators of the inverse of the covariance matrix -- also called precision matrix -- from regularized approaches. This thesis, originally motivated by the problem of image classification, aims at developing a method to estimate the precision matrix in high dimension, that is when the sample size $n$ is small compared to the dimension $p$ of the model. Our approach relies basically on the connection of the precision matrix to the linear regression model. It consists of estimating the precision matrix in two steps. The off-diagonal elements are first estimated by solving $p$ minimization problems of the type $ell_1$-penalized square-root of least-squares. The diagonal entries are then obtained from the result of the previous step, by residual analysis of likelihood maximization. This various estimators of the diagonal entries are compared in terms of estimation risk. Moreover, we propose a new estimator, designed to consider the possible contamination of data by outliers, thanks to the addition of a $ell_2/ell_1$ mixed norm regularization term. The nonasymptotic analysis of the consistency of our estimator points out the relevance of our method
APA, Harvard, Vancouver, ISO, and other styles
44

Park, Jangho. "Efficient Global Optimization of Multidisciplinary System using Variable Fidelity Analysis and Dynamic Sampling Method." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/91911.

Full text
Abstract:
Work in this dissertation is motivated by reducing the design cost at the early design stage while maintaining high design accuracy throughout all design stages. It presents four key design methods to improve the performance of Efficient Global Optimization for multidisciplinary problems. First, a fidelity-calibration method is developed and applied to lower-fidelity samples. Function values analyzed by lower fidelity analysis methods are updated to have equivalent accuracy to that of the highest fidelity samples, and these calibrated data sets are used to construct a variable-fidelity Kriging model. For the design of experiment (DOE), a dynamic sampling method is developed and includes filtering and infilling data based on mathematical criteria on the model accuracy. In the sample infilling process, multi-objective optimization for exploitation and exploration of design space is carried out. To indicate the fidelity of function analysis for additional samples in the variable-fidelity Kriging model, a dynamic fidelity indicator with the overlapping coefficient is proposed. For the multidisciplinary design problems, where multiple physics are tightly coupled with different coupling strengths, multi-response Kriging model is introduced and utilizes the method of iterative Maximum Likelihood Estimation (iMLE). Through the iMLE process, a large number of hyper-parameters in multi-response Kriging can be calculated with great accuracy and improved numerical stability. The optimization methods developed in the study are validated with analytic functions and showed considerable performance improvement. Consequentially, three practical design optimization problems of NACA0012 airfoil, Multi-element NLR 7301 airfoil, and all-moving-wingtip control surface of tailless aircraft are performed, respectively. The results are compared with those of existing methods, and it is concluded that these methods guarantee the equivalent design accuracy at computational cost reduced significantly.
Doctor of Philosophy
In recent years, as the cost of aircraft design is growing rapidly, and aviation industry is interested in saving time and cost for the design, an accurate design result during the early design stages is particularly important to reduce overall life cycle cost. The purpose of the work to reducing the design cost at the early design stage with design accuracy as high as that of the detailed design. The method of an efficient global optimization (EGO) with variable-fidelity analysis and multidisciplinary design is proposed. Using the variable-fidelity analysis for the function evaluation, high fidelity function evaluations can be replaced by low-fidelity analyses of equivalent accuracy, which leads to considerable cost reduction. As the aircraft system has sub-disciplines coupled by multiple physics, including aerodynamics, structures, and thermodynamics, the accuracy of an individual discipline affects that of all others, and thus the design accuracy during in the early design states. Four distinctive design methods are developed and implemented into the standard Efficient Global Optimization (EGO) framework: 1) the variable-fidelity analysis based on error approximation and calibration of low-fidelity samples, 2) dynamic sampling criteria for both filtering and infilling samples, 3) a dynamic fidelity indicator (DFI) for the selection of analysis fidelity for infilled samples, and 4) Multi-response Kriging model with an iterative Maximum Likelihood estimation (iMLE). The methods are validated with analytic functions, and the improvement in cost efficiency through the overall design process is observed, while maintaining the design accuracy, by a comparison with existing design methods. For the practical applications, the methods are applied to the design optimization of airfoil and complete aircraft configuration, respectively. The design results are compared with those by existing methods, and it is found the method results design results of accuracies equivalent to or higher than high-fidelity analysis-alone design at cost reduced by orders of magnitude.
APA, Harvard, Vancouver, ISO, and other styles
45

Chu, Shuyu. "Change Detection and Analysis of Data with Heterogeneous Structures." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78613.

Full text
Abstract:
Heterogeneous data with different characteristics are ubiquitous in the modern digital world. For example, the observations collected from a process may change on its mean or variance. In numerous applications, data are often of mixed types including both discrete and continuous variables. Heterogeneity also commonly arises in data when underlying models vary across different segments. Besides, the underlying pattern of data may change in different dimensions, such as in time and space. The diversity of heterogeneous data structures makes statistical modeling and analysis challenging. Detection of change-points in heterogeneous data has attracted great attention from a variety of application areas, such as quality control in manufacturing, protest event detection in social science, purchase likelihood prediction in business analytics, and organ state change in the biomedical engineering. However, due to the extraordinary diversity of the heterogeneous data structures and complexity of the underlying dynamic patterns, the change-detection and analysis of such data is quite challenging. This dissertation aims to develop novel statistical modeling methodologies to analyze four types of heterogeneous data and to find change-points efficiently. The proposed approaches have been applied to solve real-world problems and can be potentially applied to a broad range of areas.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
46

Hassani, Mujtaba. "CONSTRUCTION EQUIPMENT FUEL CONSUMPTION DURING IDLING : Characterization using multivariate data analysis at Volvo CE." Thesis, Mälardalens högskola, Akademin för ekonomi, samhälle och teknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-49007.

Full text
Abstract:
Human activities have increased the concentration of CO2 into the atmosphere, thus it has caused global warming. Construction equipment are semi-stationary machines and spend at least 30% of its life time during idling. The majority of the construction equipment is diesel powered and emits toxic emission into the environment. In this work, the idling will be investigated through adopting several statistical regressions models to quantify the fuel consumption of construction equipment during idling. The regression models which are studied in this work: Multivariate Linear Regression (ML-R), Support Vector Machine Regression (SVM-R), Gaussian Process regression (GP-R), Artificial Neural Network (ANN), Partial Least Square Regression (PLS-R) and Principal Components Regression (PC-R). Findings show that pre-processing has a significant impact on the goodness of the prediction of the explanatory data analysis in this field. Moreover, through mean centering and application of the max-min scaling feature, the accuracy of models increased remarkably. ANN and GP-R had the highest accuracy (99%), PLS-R was the third accurate model (98% accuracy), ML-R was the fourth-best model (97% accuracy), SVM-R was the fifth-best (73% accuracy) and the lowest accuracy was recorded for PC-R (83% accuracy). The second part of this project estimated the CO2 emission based on the fuel used and by adopting the NONROAD2008 model.  Keywords:
APA, Harvard, Vancouver, ISO, and other styles
47

Sebastian, Maria Treesa. "Modelling Bitcell Behaviour." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166218.

Full text
Abstract:
With advancements in technology, the dimensions of transistors are scaling down. It leads to shrinkage in the size of memory bitcells, increasing its sensitivity to process variations introduced during manufacturing. Failure of a single bitcell can cause the failure of an entire memory; hence careful statistical analysis is essential in estimating the highest reliable performance of the bitcell before using them in memory design. With high repetitiveness of bitcell, the traditional method of Monte Carlo simulation would require along time for accurate estimation of rare failure events. A more practical approach is importance sampling where more samples are collected from the failure region. Even though importance sampling is much faster than Monte Carlo simulations, it is still fairly time-consuming as it demands an iterative search making it impractical for large simulation sets. This thesis proposes two machine learning models that can be used in estimating the performance of a bitcell. The first model predicts the time taken by the bitcell for read or write operation. The second model predicts the minimum voltage required in maintaining the bitcell stability. The models were trained using the K-nearest neighbors algorithm and Gaussian process regression. Three sparse approximations were implemented in the time prediction model as a bigger dataset was available. The obtained results show that the models trained using Gaussian process regression were able to provide promising results.
APA, Harvard, Vancouver, ISO, and other styles
48

Mosallam, Ahmed. "Remaining useful life estimation of critical components based on Bayesian Approaches." Thesis, Besançon, 2014. http://www.theses.fr/2014BESA2069/document.

Full text
Abstract:
La construction de modèles de pronostic nécessite la compréhension du processus de dégradation des composants critiques surveillés afin d’estimer correctement leurs durées de fonctionnement avant défaillance. Un processus de d´dégradation peut être modélisé en utilisant des modèles de Connaissance issus des lois de la physique. Cependant, cette approche n´nécessite des compétences Pluridisciplinaires et des moyens expérimentaux importants pour la validation des modèles générés, ce qui n’est pas toujours facile à mettre en place en pratique. Une des alternatives consiste à apprendre le modèle de dégradation à partir de données issues de capteurs installés sur le système. On parle alors d’approche guidée par des données. Dans cette thèse, nous proposons une approche de pronostic guidée par des données. Elle vise à estimer à tout instant l’état de santé du composant physique et prédire sa durée de fonctionnement avant défaillance. Cette approche repose sur deux phases, une phase hors ligne et une phase en ligne. Dans la phase hors ligne, on cherche à sélectionner, parmi l’ensemble des signaux fournis par les capteurs, ceux qui contiennent le plus d’information sur la dégradation. Cela est réalisé en utilisant un algorithme de sélection non supervisé développé dans la thèse. Ensuite, les signaux sélectionnés sont utilisés pour construire différents indicateurs de santé représentant les différents historiques de données (un historique par composant). Dans la phase en ligne, l’approche développée permet d’estimer l’état de santé du composant test en faisant appel au filtre Bayésien discret. Elle permet également de calculer la durée de fonctionnement avant défaillance du composant en utilisant le classifieur k-plus proches voisins (k-NN) et le processus de Gauss pour la régression. La durée de fonctionnement avant défaillance est alors obtenue en comparant l’indicateur de santé courant aux indicateurs de santé appris hors ligne. L’approche développée à été vérifiée sur des données expérimentales issues de la plateforme PRO-NOSTIA sur les roulements ainsi que sur des données fournies par le Prognostic Center of Excellence de la NASA sur les batteries et les turboréacteurs
Constructing prognostics models rely upon understanding the degradation process of the monitoredcritical components to correctly estimate the remaining useful life (RUL). Traditionally, a degradationprocess is represented in the form of physical or experts models. Such models require extensiveexperimentation and verification that are not always feasible in practice. Another approach that buildsup knowledge about the system degradation over time from component sensor data is known as datadriven. Data driven models require that sufficient historical data have been collected.In this work, a two phases data driven method for RUL prediction is presented. In the offline phase, theproposed method builds on finding variables that contain information about the degradation behaviorusing unsupervised variable selection method. Different health indicators (HI) are constructed fromthe selected variables, which represent the degradation as a function of time, and saved in the offlinedatabase as reference models. In the online phase, the method estimates the degradation state usingdiscrete Bayesian filter. The method finally finds the most similar offline health indicator, to the onlineone, using k-nearest neighbors (k-NN) classifier and Gaussian process regression (GPR) to use it asa RUL estimator. The method is verified using PRONOSTIA bearing as well as battery and turbofanengine degradation data acquired from NASA data repository. The results show the effectiveness ofthe method in predicting the RUL
APA, Harvard, Vancouver, ISO, and other styles
49

Karmann, Clémence. "Inférence de réseaux pour modèles inflatés en zéro." Thesis, Université de Lorraine, 2019. http://www.theses.fr/2019LORR0146/document.

Full text
Abstract:
L'inférence de réseaux ou inférence de graphes a de plus en plus d'applications notamment en santé humaine et en environnement pour l'étude de données micro-biologiques et génomiques. Les réseaux constituent en effet un outil approprié pour représenter, voire étudier des relations entre des entités. De nombreuses techniques mathématiques d'estimation ont été développées notamment dans le cadre des modèles graphiques gaussiens mais aussi dans le cas de données binaires ou mixtes. Le traitement des données d'abondance (de micro-organismes comme les bactéries par exemple) est particulier pour deux raisons : d'une part elles ne reflètent pas directement la réalité car un processus de séquençage a lieu pour dupliquer les espèces et ce processus apporte de la variabilité, d'autre part une espèce peut être absente dans certains échantillons. On est alors dans le cadre de données inflatées en zéro. Beaucoup de méthodes d'inférence de réseaux existent pour les données gaussiennes, les données binaires et les données mixtes mais les modèles inflatés en zéro sont très peu étudiés alors qu'ils reflètent la structure de nombreux jeux de données de façon pertinente. L'objectif de cette thèse concerne l'inférence de réseaux pour les modèles inflatés en zéro. Dans cette thèse, on se limitera à des réseaux de dépendances conditionnelles. Le travail présenté dans cette thèse se décompose principalement en deux parties. La première concerne des méthodes d'inférence de réseaux basées sur l'estimation de voisinages par une procédure couplant des méthodes de régressions ordinales et de sélection de variables. La seconde se focalise sur l'inférence de réseaux dans un modèle où les variables sont des gaussiennes inflatées en zéro par double troncature (à droite et à gauche)
Network inference has more and more applications, particularly in human health and environment, for the study of micro-biological and genomic data. Networks are indeed an appropriate tool to represent, or even study, relationships between entities. Many mathematical estimation techniques have been developed, particularly in the context of Gaussian graphical models, but also in the case of binary or mixed data. The processing of abundance data (of microorganisms such as bacteria for example) is particular for two reasons: on the one hand they do not directly reflect reality because a sequencing process takes place to duplicate species and this process brings variability, on the other hand a species may be absent in some samples. We are then in the context of zero-inflated data. Many graph inference methods exist for Gaussian, binary and mixed data, but zero-inflated models are rarely studied, although they reflect the structure of many data sets in a relevant way. The objective of this thesis is to infer networks for zero-inflated models. In this thesis, we will restrict to conditional dependency graphs. The work presented in this thesis is divided into two main parts. The first one concerns graph inference methods based on the estimation of neighbourhoods by a procedure combining ordinal regression models and variable selection methods. The second one focuses on graph inference in a model where the variables are Gaussian zero-inflated by double truncation (right and left)
APA, Harvard, Vancouver, ISO, and other styles
50

Fang, Yizhou. "Inference for Continuous Stochastic Processes Using Gaussian Process Regression." Thesis, 2014. http://hdl.handle.net/10012/8159.

Full text
Abstract:
Gaussian process regression (GPR) is a long-standing technique for statistical interpolation between observed data points. Having originally been applied to spatial analysis in the 1950s, GPR offers highly nonlinear predictions with uncertainty adjusting to the degree of extrapolation -- at the expense of very few model parameters to be fit. Thus GPR has gained considerable popularity in statistical applications such as machine learning and nonparametric density estimation. In this thesis, we explore the potential for GPR to improve the efficiency of parametric inference for continuous-time stochastic processes. For almost all such processes, the likelihood function based on discrete observations cannot be written in closed-form. However, it can be very well approximated if the inter-observation time is small. Therefore, a popular strategy for parametric inference is to introduce missing data between actual observations. In a Bayesian context, samples from the posterior distribution of the parameters and missing data are then typically obtained using Markov chain Monte Carlo (MCMC) methods, which can be computationally very expensive. Here, we consider the possibility of using GPR to impute the marginal distribution of the missing data directly. These imputations could then be leveraged to produce independent draws from the joint posterior by Importance Sampling, for a significant gain in computational efficiency. In order to illustrate the methodology, three continuous processes are examined. The first one is based on a neural excitation model with a non-standard periodic component. The second and third are popular financial models often used for option pricing. While preliminary inferential results are quite promising, we point out several improvements to the methodology which remain to be explored.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography