Дисертації: "Data projection"

1

McWilliams, Brian Victor Parulian. "Projection based models for high dimensional data." Thesis, Imperial College London, 2011. http://hdl.handle.net/10044/1/9577.

Повний текст джерела

Анотація:

In recent years, many machine learning applications have arisen which deal with the problem of finding patterns in high dimensional data. Principal component analysis (PCA) has become ubiquitous in this setting. PCA performs dimensionality reduction by estimating latent factors which minimise the reconstruction error between the original data and its low-dimensional projection. We initially consider a situation where influential observations exist within the dataset which have a large, adverse affect on the estimated PCA model. We propose a measure of “predictive influence” to detect these points based on the contribution of each point to the leave-one-out reconstruction error of the model using an analytic PRedicted REsidual Sum of Squares (PRESS) statistic. We then develop a robust alternative to PCA to deal with the presence of influential observations and outliers which minimizes the predictive reconstruction error. In some applications there may be unobserved clusters in the data, for which fitting PCA models to subsets of the data would provide a better fit. This is known as the subspace clustering problem. We develop a novel algorithm for subspace clustering which iteratively fits PCA models to subsets of the data and assigns observations to clusters based on their predictive influence on the reconstruction error. We study the convergence of the algorithm and compare its performance to a number of subspace clustering methods on simulated data and in real applications from computer vision involving clustering object trajectories in video sequences and images of faces. We extend our predictive clustering framework to a setting where two high-dimensional views of data have been obtained. Often, only either clustering or predictive modelling is performed between the views. Instead, we aim to recover clusters which are maximally predictive between the views. In this setting two block partial least squares (TB-PLS) is a useful model. TB-PLS performs dimensionality reduction in both views by estimating latent factors that are highly predictive. We fit TB-PLS models to subsets of data and assign points to clusters based on their predictive influence under each model which is evaluated using a PRESS statistic. We compare our method to state of the art algorithms in real applications in webpage and document clustering and find that our approach to predictive clustering yields superior results. Finally, we propose a method for dynamically tracking multivariate data streams based on PLS. Our method learns a linear regression function from multivariate input and output streaming data in an incremental fashion while also performing dimensionality reduction and variable selection. Moreover, the recursive regression model is able to adapt to sudden changes in the data generating mechanism and also identifies the number of latent factors. We apply our method to the enhanced index tracking problem in computational finance.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Sibley, Christy N. "Analyzing Navy Officer Inventory Projection Using Data Farming." Thesis, Monterey, California. Naval Postgraduate School, 2012. http://hdl.handle.net/10945/6868.

Повний текст джерела

Анотація:

Approved for public release, distribution unlimited
The Navys Strategic Planning and Analysis Directorate (OPNAV N14) uses a complex model to project officer status in the coming years. The Officer Strategic Analysis Model (OSAM) projects officer status using an initial inventory, historical loss rates, and dependent functions for accessions, losses, lateral transfers, and promotions that reflect Navy policy and U.S. law. OSAM is a tool for informing decision makers as they consider potential policy changes, or analyze the impact of policy changes already in place, by generating Navy Officer inventory projections for a specified time horizon. This research explores applications of data farming for potential improvement of OSAM. An analysis of OSAM inventory forecast variations over a large number of scenarios while changing multiple input parameters enables assessment of key inputs. This research explores OSAM through applying the principles of design of experiments, regression modeling, and nonlinear programming. The objectives of this portion of the work include identifying critical parameters, determining a suitable measure of effectiveness, assessing model sensitivities, evaluating performance across a spectrum of loss adjustment factors, and determining appropriate values of key model inputs for future use in forecasting Navy officer inventory.

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Eslava-Gomez, Guillermina. "Projection pursuit and other graphical methods for multivariate data." Thesis, University of Oxford, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.236118.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Ebert, Matthias. "Non-ideal projection data in X-ray computed tomography." [S.l. : s.n.], 2002. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB10605022.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Cropanese, Frank C. "Synthesis of low k1 projection lithography utilizing interferometry /." Link to online version, 2005. https://ritdml.rit.edu/dspace/handle/1850/1235.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Folgieri, R. "Ensembles based on Random Projection for gene expression data analysis." Doctoral thesis, Università degli Studi di Milano, 2008. http://hdl.handle.net/2434/45878.

Повний текст джерела

Анотація:

In this work we focused on methods to solve classification problems characterized by high dimensionality and low cardinality data. These features are relevant in bio-molecular data analysis and particularly in class prediction whith microarray data. Many methods have been proposed to approach this problem, characterized by the so called curse of dimensionality (term introduced by Richard Bellman (9)). Among them, gene selection methods, principal and independent component analysis, kernel methods. In this work we propose and we experimentally analyze two ensemble methods based on two randomized techniques for data compression: Random Subspaces and Random Projections. While Random Subspaces, originally proposed by T. K. Ho, is a technique related to feature subsampling, Random Projections is a feature extraction technique motivated by the Johnson-Lindenstrauss theory about distance preserving random projections. The randomness underlying the proposed approach leads to diverse sets of extracted features corresponding to low dimensional subspaces with low metric distortion and approximate preservation of the expected loss of the trained base classifiers. In the first part of the work we justify our approach with two theoretical results. The first regards unsupervised learning: we prove that a clustering algorithm minimizing the objective (quadratic) function provides a -closed solution if applied to compressed data according to Johnson-Lindenstrauss theory. The second one is related to supervised learning: we prove that Polynomials kernels are approximatively preserved by Random Projections, up to a degradation proportional to the square of the degree of the polynomial. In the second part of the work, we propose ensemble algorithms based on Random Subspaces and Random Projections, and we experimentally compare them with single SVM and other state-of-the-art ensemble methods, using three gene expression data set: Colon, Leukemia and DLBL-FL - i.e. Diffuse Large B-cell and Follicular Lymphoma. The obtained results confirm the effectiveness of the proposed approach. Moreover, we observed a certain performance degradation of Random Projection methods when the base learners are SVMs with polynomial kernel of high degree.

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Bolton, Richard John. "Multivariate analysis of multiproduct market research data." Thesis, University of Exeter, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.302542.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Kishimoto, Paul Natsuo. "Transport demand in China : estimation, projection, and policy assessment." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/120664.

Повний текст джерела

Анотація:

Thesis: Ph. D. in Engineering Systems, Massachusetts Institute of Technology, School of Engineering, Institute for Data, Systems, and Society, 2018.
Cataloged from PDF version of thesis. "Some pages in the original document contain text that runs off the edge of the page"--Disclaimer Notice page.
Includes bibliographical references.
China's rapid economic growth in the twenty-first century has driven, and been driven by, concomitant motorization and growth of passenger and freight mobility, leading to greater energy demand and environmental impacts. In this dissertation I develop methods to characterize the evolution of passenger transport demand in a rapidly-developing country, in order to support projection and policy assessment. In Essay #1, I study the role that vehicle tailpipe and fuel quality standards ("emissions standards") can play vis-à-vis economy-wide carbon pricing in reducing emissions of pollutants that lead to poor air quality. I extend a global, computable general equilibrium (CGE) model resolving 30 Chinese provinces by separating freight and passenger transport subsectors, road and non-road modes, and household-owned vehicles; and then linking energy demand in these subsectors to a province-level inventory of primary pollutant emissions and future policy targets. While climate policy yields an air quality co-benefit by inducing shifts away from dirtier fuels, this effect is weak within the transport sector. Current emissions standards can drastically reduce transportation emissions, but their overall impact is limited by transport's share in total emissions, which varies across provinces. I conclude that the two categories of measures examined are complementary, and the effectiveness of emissions standards relies on enforcement in removing older, higher-polluting vehicles from the roads. In Essay #2, I characterize Chinese households' demand for transport by estimating the recently-developed, Exact affine Stone index (EASI) demand system on publicly-available data from non-governmental, social surveys. Flexible, EASI demands are particularly useful in China's rapidly-changing economy and transport system, because they capture ways that income elasticities of demand, and household transport budgets, vary with incomes; with population and road network densities; and with the supply of alternative transport modes. I find transport demand to be highly elastic ([epsilon][subscript x] = 1.46) at low incomes, and that income-elasticity of demand declines but remains greater than unity as incomes rise, so that the share of transport in households' spending rises monotonically from 1.6 % to 7.5 %; a wider, yet lower range than in some previous estimates. While no strong effects of city-level factors are identified, these and other non-income effects account for a larger portion of budget share changes than rising incomes. Finally, in Essay #3, I evaluate the predictive performance of the EASI demand system, by testing the sensitivity of model fit to the data available for estimation, in comparison with the less flexible, but widely used, Almost Ideal demand system (AIDS). In rapidly-evolving countries such as China, survey data without nationwide coverage can be used to characterize transport systems, but the omission of cities and provinces could bias results. To examine this possibility, I estimate demand systems on data subsets and test their predictions against observations for the withheld fraction. I find that simple EASI specifications slightly outperform AIDS under cross-validation; these offer a ready replacement in standalone and CGE applications. However, a trade-off exists between accuracy and the inclusion of policy-relevant covariates when data omit areas with high values of these variables. Also, while province-level fixed-effects control for unobserved heterogeneity across units that may bias parameter estimates, they increase prediction error in out-of-sample applications-revealing that the influence of local conditions on household transport expenditure varies significantly across China's provinces. The results motivate targeted transport data collection that better spans variation on city types and attributes; and the validation technique aids transport modelers in designing and validating demand specifications for projection and assessment.
by Paul Natsuo Kishimoto.
Ph. D. in Engineering Systems

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Divak, Martin. "Simulated SAR with GIS data and pose estimation using affine projection." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-66303.

Повний текст джерела

Анотація:

Pilots or autonomous aircraft need to know where they are in relation to the environment. On board aircraft there are inertial sensors that are prone to drift which requires corrections by referencing against known items, places, or signals. One such method of referencing is with global navigation satellite systems, and others, that are highlighted in this work, are based on using visual sensors. In particular the use of Synthetic Aperture Radar is emerging as a viable alternative. To use radar images in qualitative or quantitative analysis they must be registered with geographical information. Position data on an aircraft or spacecraft is not sufficient to determine with certainty what or where it is one is looking at in a radar image without referencing other images over the same area. It is demonstrated in this thesis that a digital elevation model can be split up and classified into different types of radar scatterers. Different parts of the terrain yielding different types of echoes increases the amount of radar specific characteristics in simulated reference images. This work also presents an interpretation of the imaging geometry of SAR such that existing methods in Computer Vision may be used to estimate the position from which a radar image has been taken. This is a direct image matching without requiring registration that is necessary for other proposals of SAR-based navigation solutions. By determination of position continuously from radar images, aircraft could navigate independently of day light, weather, and satellite data.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Gentle, David John. "Tomographic image reconstruction from incomplete projection data with application to industry." Thesis, University of Surrey, 1990. http://epubs.surrey.ac.uk/842931/.

Повний текст джерела

Анотація:

The major objective of this work has been to investigate methods of reconstructing tomographic images from incomplete projection data. Furthermore the practical application of such techniques to industrial non-destructive testing has been considered with particular regard to the nuclear industry. Two distinct situations are considered, region of interest (ROI) tomography and limited angle of view (LV) tomography. ROI tomography relates to situations where data is limited in linear extent and can be used for high spatial resolution imaging of particular areas of interest within larger structures. Data collection times are reduced by concentrating on the ROI and the imaging of structures which cannot fit in the field of view of the scanner can be made possible. It has been shown that corrected ROI images can be of equal quality to those reconstructed from complete data. The situation where data is limited in angular range is known as LV tomography. Practical applications of such situations can include in situ imaging of objects which cannot be accessed at all required angles, and the imaging of time varying objects where limitations on the data collection times restrict the angular range of measurements. The use of the Gerchberg-Papoulis algorithm has been shown to significantly reduce the resulting artifacts. The initial work involved investigation into the minimum data requirements for tomographic imaging of objects without compromising image quality. The relative performance of filtered backprojection and ART iterative reconstruction algorithms were investigated and the superiority of ART in situations of limited data was demonstrated. The most important causes of SPECT image degradation are scattering and attenuation of photons. For scatter correction the dual energy window and Wiener deconvolution correction methods have been investigated and the results compared. A number of attenuation correction algorithms have also been investigated and their comparative performance evaluated.

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Badcock, Julie. "Projection methods for use in the analysis of multivariate process data." Thesis, University of Exeter, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.272980.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Weingessel, Andreas, Martin Natter, and Kurt Hornik. "Using independent component analysis for feature extraction and multivariate data projection." SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, 1998. http://epub.wu.ac.at/1424/1/document.pdf.

Повний текст джерела

Анотація:

Deriving low-dimensional perceptual spaces from data consisting of many variables is of crucial interest in strategic market planning. A frequently used method in this context is Principal Components Analysis, which finds uncorrelated directions in the data. This methodology which supports the identification of competitive structures can gainfully be utilized for product (re)positioning or optimal product (re)design. In our paper, we investigate the usefulness of a novel technique, Independent Component Analysis, to discover market structures. Independent Component Analysis is an extension of Principal Components Analysis in the sense that it looks for directions in the data that are not only uncorrelated but also independent. Comparing the two approaches on the basis of an empirical data set, we find that Independent Component Analysis leads to clearer and sharper structures than Principal Components Analysis. Furthermore, the results of Independent Component Analysis have a reasonable marketing interpretation.
Series: Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science"

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Maguire, Ralph Paul. "Application of pharmacokinetic models to projection data in positron emission tomography." Thesis, University of Surrey, 1999. http://epubs.surrey.ac.uk/844467/.

Повний текст джерела

Анотація:

In positron emission tomography (PET), coincidence detection of annihilation photons enables the measurement of Radon transforms of the instantaneous activity concentration of labelled tracers in the human body. Using reconstruction algorithms, spatial maps of the activity distribution can be created and analysed to reveal the pharmacokinetics of the labelled tracer. This thesis considers the possibility of applying pharmacokinetic modelling to the count rate data measured by the detectors, rather than reconstructed images, A new concept is proposed - parameter projections - Radon transforms of the spatial distribution of the parameters of the model, which simplifies the problem considerably. Using this idea, a general linear least squares GLLS framework is developed and applied to the one and two tissue-compartment models for [O-15]water and [F-18]FDG. Simulation models are developed from first principles to demonstrate the accuracy of the GLLS approach to parameter estimation. This requires the validation of the whole body distribution of each of the tracers, using pharmacokinetic techniques, leading to novel compartment based whole body models for [O-15]water and [F-18]FDG. A simplified Monte-Carlo framework for error estimation of the tissue models is developed, based on system parameters. It is also shown that the variances of maps of the spatial variance of the parameters of the model - parametric images - can be calculated in projection space. It is clearly demonstrated that the precision of the variance estimates is higher than that obtained from estimates based on reconstructed images. Using the methods, it is shown how statistical parametric maps of the difference between two neuronal activation conditions can be calculated from projection data. The methods developed allow faster results analysis, avoiding lengthy reconstruction of large data sets, and allow access to robust statistical techniques for activation analysis through use of the known, Poisson distributed nature, of the measured projection data.

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Landgraf, Andrew J. "Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1437610558.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Vamulapalli, Harika Rao. "On Dimensionality Reduction of Data." ScholarWorks@UNO, 2010. http://scholarworks.uno.edu/td/1211.

Повний текст джерела

Анотація:

Random projection method is one of the important tools for the dimensionality reduction of data which can be made efficient with strong error guarantees. In this thesis, we focus on linear transforms of high dimensional data to the low dimensional space satisfying the Johnson-Lindenstrauss lemma. In addition, we also prove some theoretical results relating to the projections that are of interest when applying them in practical applications. We show how the technique can be applied to synthetic data with probabilistic guarantee on the pairwise distance. The connection between dimensionality reduction and compressed sensing is also discussed.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Chen, Mingqing. "Development of a diaphragm tracking algorithm for megavoltage cone beam CT projection data." Thesis, University of Iowa, 2009. https://ir.uiowa.edu/etd/228.

Повний текст джерела

Анотація:

In this work several algorithms for diaphragm detection in 2D views of cone-beam computed tomography (CBCT) raw data are developed. These algorithms are tested on 21 Siemens megavoltage CBCT scans of lungs and the result is compared against the diaphragm apex identified by human experts. Among these algorithms dynamic Hough transform is sufficiently quick and accurate for motion determination prior to radiation therapy. The diaphragm was successfully detected in all 21 data sets, even for views with poor image quality and confounding objects. Each CBCT scan analysis (200 frames) took about 38 seconds on a 2.66 GHz Intel quad-core 2 CPU. The average cranio-caudal position error was 1.707 ± 1.117 mm. Other directions were not assessed due to uncertainties in expert identification.

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Malla, Noor. "Partitioning XML data, towards distributed and parallel management." Thesis, Paris 11, 2012. http://www.theses.fr/2012PA112154/document.

Повний текст джерела

Анотація:

Durant cette dernière décennie, la diffusion du format XML pour représenter les données générées par et échangées sur le Web a été accompagnée par la mise en œuvre de nombreux moteurs d’évaluation de requêtes et de mises à jour XQuery. Parmi ces moteurs, les systèmes « mémoire centrale » (Main-memory Systems) jouent un rôle très important dans de nombreuses applications. La gestion et l’intégration de ces systèmes dans des environnements de programmation sont très faciles. Cependant, ces systèmes ont des problèmes de passage à l’échelle puisqu’ils requièrent le chargement complet des documents en mémoire centrale avant traitement.Cette thèse présente une technique de partitionnement des documents XML qui permet aux moteurs « mémoire principale » d’évaluer des expressions XQuery (requêtes et mises à jour) pour des documents de très grandes tailles. Cette méthode de partitionnement s’applique à une classe de requêtes et mises à jour pertinentes et fréquentes, dites requêtes et mises à jour itératives.Cette thèse propose une technique d'analyse statique pour reconnaître les expressions « itératives ». Cette analyse statique est basée sur l’extraction de chemins à partir de l'expression XQuery, sans utilisation d'information supplémentaire sur le schéma. Des algorithmes sont spécifiés, utilisant les chemins extraits par l’étape précédente, pour partitionner les documents en entrée en plusieurs parties, de sorte que la requête ou la mise à jour peut être évaluée sur chaque partie séparément afin de calculer le résultat final par simple concaténation des résultats obtenus pour chaque partie. Ces algorithmes sont mis en œuvre en « streaming » et leur efficacité est validée expérimentalement.En plus, cette méthode de partitionnement est caractérisée également par le fait qu'elle peut être facilement implémentée en utilisant le paradigme MapReduce, permettant ainsi d'évaluer une requête ou une mise à jour en parallèle sur les données partitionnées
With the widespread diffusion of XML as a format for representing data generated and exchanged over the Web, main query and update engines have been designed and implemented in the last decade. A kind of engines that are playing a crucial role in many applications are « main-memory » systems, which distinguish for the fact that they are easy to manage and to integrate in a programming environment. On the other hand, main-memory systems have scalability issues, as they load the entire document in main-memory before processing. This Thesis presents an XML partitioning technique that allows main-memory engines to process a class of XQuery expressions (queries and updates), that we dub « iterative », on arbitrarily large input documents. We provide a static analysis technique to recognize these expressions. The static analysis is based on paths extracted from the expression and does not need additional schema information. We provide algorithms using path information for partitioning the input documents, so that the query or update can be separately evaluated on each part in order to compute the final result. These algorithms admit a streaming implementation, whose effectiveness is experimentally validated. Besides enabling scalability, our approach is also characterized by the fact that it is easily implementable into a MapReduce framework, thus enabling parallel query/update evaluation on the partitioned data

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Schäfer, Matthias Jörg [Verfasser]. "Visual Analytics for Improving Exploration and Projection of Multi-Dimensional Data / Matthias Jörg Schäfer." Konstanz : Bibliothek der Universität Konstanz, 2015. http://d-nb.info/1079391789/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Zeng, Xubin, and Kerrie Geil. "Global warming projection in the 21st century based on an observational data-driven model." AMER GEOPHYSICAL UNION, 2016. http://hdl.handle.net/10150/622341.

Повний текст джерела

Анотація:

Global warming has been projected primarily by Earth system models (ESMs). Complementary to this approach, here we provide the decadal and long-term global warming projections based on an observational data-driven model. This model combines natural multidecadal variability with anthropogenic warming that depends on the history of annual emissions. It shows good skill in decadal hindcasts with the recent warming slowdown well captured. While our ensemble mean temperature projections at the end of 21st century are consistent with those from ESMs, our decadal warming projection of 0.35 (0.30-0.43)K from 1986-2005 to 2016-2035 is within their projection range and only two-thirds of the ensemble mean from ESMs. Our predicted warming rate in the next few years is slower than in the 1980s and 1990s, followed by a greater warming rate. Our projection uncertainty range is just one-third of that from ESMs, and its implication is also discussed.

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Mueller, Klaus. "Fast and accurate three-dimensional reconstrution from cone-beam projection data using algebraic methods /." The Ohio State University, 1998. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487950658545496.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Coimbra, Danilo Barbosa. "Multidimensional projections for the visual exploration of multimedia data." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-11112016-184130/.

Повний текст джерела

Анотація:

The continuously advent of new technologies have made a rich and growing type of information sources available to analyses and investigation. In this context, multidimensional data analysis is considerably important when dealing with such large and complex datasets. Among the possibilities when analyzing such kind of data, applying visualization techniques can help the user find and understand patters, trends and establish new goals. Some applications examples of visualization of multidimensional data analysis goes from image classification, semantic word clouds, cluster analysis of document collection to exploration of multimedia content. This thesis presents several visualization methods to interactively explore multidimensional datasets aimed from specialized to casual users, by making use of both static and dynamic representations created by multidimensional projections. Firstly, we present a multidimen- sional projection technique which faithfully preserves distance and can handle any type of high-dimensional data, demonstrating applications scenarios in both multimedia and text docu- ments collections. Next, we address the task of interpreting projections in 2D, by calculating neighborhood errors. Hereafter, we present a set of interactive visualizations that aim to help users with these tasks by revealing the quality of a projection in 3D, applied in different high dimensional scenarios. In the final part, we address two different approaches to get insight into multimedia data, in special soccer sport videos. While the first make use of multidimensional projections, the second uses efficient visual metaphor to help non-specialist users in browsing and getting insights in soccer matches.
O advento contínuo de novas tecnologias tem criado um tipo rico e crescente de fontes de informação disponíveis para análise e investigação. Neste contexto, a análise de dados multidi- mensional é consideravelmente importante quando se lida com grandes e complexos conjuntos de dados. Dentre as possibilidades ao analisar esses tipos de dados, a aplicação de técnicas de visualização pode auxiliar o usuário a encontrar e entender os padrões, tendências e estabelecer novas metas. Alguns exemplos de aplicações de visualização de análise de dados multidimen- sionais vão de classificação de imagens, nuvens semântica de palavras, e análise de grupos de coleção de documentos, à exploração de conteúdo multimídia. Esta tese apresenta vários métodos de visualização para explorar de forma interativa conjuntos de dados multidimensionais que visam de usuários especializados aos casuais, fazendo uso de ambas representações estáticas e dinâmicas criadas por projeções multidimensionais. Primeiramente, apresentamos uma técnica de projeção multidimensional que preserva fielmente distância e que pode lidar com qualquer tipo de dados com alta-dimensionalidade, demonstrando cenários de aplicações em ambos os casos de multimídia e coleções de documentos de texto. Em seguida, abordamos a tarefa de interpretar as projeções em 2D, calculando erros de vizinhança. Posteriormente, apresentamos um conjunto de visualizações interativas que visam ajudar os usuários com essas tarefas, revelando a qualidade de uma projeção em 3D, aplicadas em diferentes cenários de alta dimensionalidade. Na parte final, discutimos duas abordagens diferentes para obter percepções sobre dados multimídia, em particular vídeos de futebol. Enquanto a primeira abordagem utiliza projeções multidimensionais, a segunda faz uso de uma eficiente metáfora visual para auxiliar usuários não especialistas em navegar e obter conhecimento em partidas de futebol.

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Böckmann, Christine, and Janos Sarközi. "The ill-posed inversion of multiwavelength lidar data by a hybrid method of variable projection." Universität Potsdam, 1999. http://opus.kobv.de/ubp/volltexte/2007/1484/.

Повний текст джерела

Анотація:

The ill-posed problem of aerosol distribution determination from a small number of backscatter and extinction lidar measurements was solved successfully via a hybrid method by a variable dimension of projection with B-Splines. Numerical simulation results with noisy data at different measurement situations show that it is possible to derive a reconstruction of the aerosol distribution only with 4 measurements.

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Chavez, Daniel. "Parallelizing Map Projection of Raster Data on Multi-core CPU and GPU Parallel Programming Frameworks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-190883.

Повний текст джерела

Анотація:

Map projections lie at the core of geographic information systems and numerous projections are used today. The reprojection between different map projections is recurring in a geographic information system and it can be parallelized with multi-core CPUs and GPUs. This thesis implements a parallel analytic reprojection algorithm of raster data in C/C++ with the parallel programming frameworks Pthreads, C++11 STL threads, OpenMP, Intel TBB, CUDA and OpenCL. The thesis compares the execution times from the different implementations on small, medium and large raster data sets, where OpenMP had the best speedup of 6, 6.2 and 5.5, respectively. Meanwhile, the GPU implementations were 293 % faster than the fastest CPU implementations, where profiling shows that the CPU implementations spend most time on trigonometry functions. The results show that reprojection algorithm is well suited for the GPU, while OpenMP and Intel TBB are the fastest of the CPU frameworks.
Kartprojektioner är en central del av geografiska informationssystem och en otalig mängd av kartprojektioner används idag. Omprojiceringen mellan olika kartprojektioner sker regelbundet i ett geografiskt informationssystem och den kan parallelliseras med flerkärniga CPU:er och GPU:er. Denna masteruppsats implementerar en parallel och analytisk omprojicering av rasterdata i C/C++ med ramverken Pthreads, C++11 STL threads, OpenMP, Intel TBB, CUDA och OpenCL. Uppsatsen jämför de olika implementationernas exekveringstider på tre rasterdata av varierande storlek, där OpenMP hade bäst speedup på 6, 6.2 och 5.5. GPU-implementationerna var 293 % snabbare än de snabbaste CPU-implementationerna, där profileringen visar att de senare spenderade mest tid på trigonometriska funktioner. Resultaten visar att GPU:n är bäst lämpad för omprojicering av rasterdata, medan OpenMP är den snabbaste inom CPU ramverken.

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Dudziak, William James. "PRESENTATION AND ANALYSIS OF A MULTI-DIMENSIONAL INTERPOLATION FUNCTION FOR NON-UNIFORM DATA: MICROSPHERE PROJECTION." University of Akron / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=akron1183403994.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Salter, James Martin. "Uncertainty quantification for spatial field data using expensive computer models : refocussed Bayesian calibration with optimal projection." Thesis, University of Exeter, 2017. http://hdl.handle.net/10871/30114.

Повний текст джерела

Анотація:

In this thesis, we present novel methodology for emulating and calibrating computer models with high-dimensional output. Computer models for complex physical systems, such as climate, are typically expensive and time-consuming to run. Due to this inability to run computer models efficiently, statistical models ('emulators') are used as fast approximations of the computer model, fitted based on a small number of runs of the expensive model, allowing more of the input parameter space to be explored. Common choices for emulators are regressions and Gaussian processes. The input parameters of the computer model that lead to output most consistent with the observations of the real-world system are generally unknown, hence computer models require careful tuning. Bayesian calibration and history matching are two methods that can be combined with emulators to search for the best input parameter setting of the computer model (calibration), or remove regions of parameter space unlikely to give output consistent with the observations, if the computer model were to be run at these settings (history matching). When calibrating computer models, it has been argued that fitting regression emulators is sufficient, due to the large, sparsely-sampled input space. We examine this for a range of examples with different features and input dimensions, and find that fitting a correlated residual term in the emulator is beneficial, in terms of more accurately removing regions of the input space, and identifying parameter settings that give output consistent with the observations. We demonstrate and advocate for multi-wave history matching followed by calibration for tuning. In order to emulate computer models with large spatial output, projection onto a low-dimensional basis is commonly used. The standard accepted method for selecting a basis is to use n runs of the computer model to compute principal components via the singular value decomposition (the SVD basis), with the coefficients given by this projection emulated. We show that when the n runs used to define the basis do not contain important patterns found in the real-world observations of the spatial field, linear combinations of the SVD basis vectors will not generally be able to represent these observations. Therefore, the results of a calibration exercise are meaningless, as we converge to incorrect parameter settings, likely assigning zero posterior probability to the correct region of input space. We show that the inadequacy of the SVD basis is very common and present in every climate model field we looked at. We develop a method for combining important patterns from the observations with signal from the model runs, developing a calibration-optimal rotation of the SVD basis that allows a search of the output space for fields consistent with the observations. We illustrate this method by performing two iterations of history matching on a climate model, CanAM4. We develop a method for beginning to assess model discrepancy for climate models, where modellers would first like to see whether the model can achieve certain accuracy, before allowing specific model structural errors to be accounted for. We show that calibrating using the basis coefficients often leads to poor results, with fields consistent with the observations ruled out in history matching. We develop a method for adjusting for basis projection when history matching, so that an efficient and more accurate implausibility bound can be derived that is consistent with history matching using the computationally prohibitive spatial field.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Llerena, Soledad Espezua. "Redução dimensional de dados de alta dimensão e poucas amostras usando Projection Pursuit." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/18/18153/tde-10102013-150240/.

Повний текст джерела

Анотація:

Reduzir a dimensão de bancos de dados é um passo importante em processos de reconhecimento de padrões e aprendizagem de máquina. Projection Pursuit (PP) tem emergido como uma técnica relevante para tal fim, a qual busca projeções dos dados em espaços de baixa dimensão onde estruturas interessantes sejam reveladas. Apesar do relativo sucesso de PP em vários problemas de redução dimensional, a literatura mostra uma aplicação limitada da mesma em bancos de dados com elevada quantidade de atributos e poucas amostras, tais como os gerados em biologia molecular. Nesta tese, estudam-se formas de aproveitar o potencial de PP em problemas de alta dimensão e poucas amostras a fim de facilitar a posterior construção de classificadores. Entre as principais contribuições deste trabalho tem-se: i) Sequential Projection Pursuit Modified (SPPM), um método de busca sequencial de espaços de projeção baseado em Algoritmo Genético (AG) e operadores de cruzamento especializados; ii) Block Sequential Projection Pursuit Modified (Block-SPPM) e Whitened Sequential Projection Pursuit Modified (W-SPPM), duas estratégias de aplicação de SPPM em problemas com mais atributos do que amostras, sendo a primeira baseada e particionamento de atributos e a segunda baseada em pré-compactação dos dados. Avaliações experimentais sobre bancos de dados públicos de expressão gênica mostraram a eficácia das propostas em melhorar a acurácia de algoritmos de classificação populares em relação a vários outros métodos de redução dimensional, tanto de seleção quanto de extração de atributos, encontrando-se que W-SPPM oferece o melhor compromisso entre acurácia e custo computacional.
Reducing the dimension of datasets is an important step in pattern recognition and machine learning processes. PP has emerged as a relevant technique for that purpose. PP aims to find projections of the data in low dimensional spaces where interesting structures are revealed. Despite the success of PP in many dimension reduction problems, the literature shows a limited application of it in dataset with large amounts of features and few samples, such as those obtained in molecular biology. In this work we study ways to take advantage of the potential of PP in order to deal with problems of large dimensionalities and few samples. Among the main contributions of this work are: i) SPPM, an improved method for searching projections, based on a genetic algorithm and specialized crossover operators; and ii) Block-SPPM and W-SPPM, two strategies of applying SPPM in problems with more attributes than samples. The first strategy is based on partitioning the attribute space while the later is based on a precompaction of the data followed by a projection search. Experimental evaluations over public gene-expression datasets showed the efficacy of the proposals in improving the accuracy of popular classifiers with respect to several representative dimension reduction methods, being W-SPPM the strategy with the best compromise between accuracy and computational cost.

Стилі APA, Harvard, Vancouver, ISO та ін.

27

witt, micah. "Proton Computed Tomography: Matrix Data Generation Through General Purpose Graphics Processing Unit Reconstruction." CSUSB ScholarWorks, 2014. https://scholarworks.lib.csusb.edu/etd/2.

Повний текст джерела

Анотація:

Proton computed tomography (pCT) is an image modality that will improve treatment planning for patients receiving proton radiation therapy compared with the current techniques, which are based on X-ray CT. Images are reconstructed in pCT by solving a large and sparse system of linear equations. The size of the system necessitates matrix-partitioning and parallel reconstruction algorithms to be implemented across some sort of cluster computing architecture. The prototypical algorithm to solve the pCT system is the algebraic reconstruction technique (ART) that has been modified into parallel versions called block-iterative-projection (BIP) methods and string-averaging-projection (SAP) methods. General purpose graphics processing units (GPGPUs) have hundreds of stream processors for massively parallel calculations. A GPGPU cluster is a set of nodes, with each node containing a set of GPGPUs. This thesis describes a proton simulator that was developed to generate realistic pCT data sets. Simulated data sets were used to compare the performance of a BIP implementation against a SAP implementation on a single GPGPU with the data stored in a sparse matrix structure called the compressed sparse row (CSR) format. Both BIP and SAP algorithms allow for parallel computation by creating row partitions of the pCT linear system. The difference between these two general classes of algorithms is that BIP permits parallel computations within the row partitions yet sequential computations between the row partitions, whereas SAP permits parallel computations between the row partitions yet sequential computations within the row partitions. This thesis also introduces a general partitioning scheme to be applied to a GPGPU cluster to achieve a pure parallel ART algorithm while providing a framework for column partitioning to the pCT system, as well as show sparse visualization patterns that can be found via specified ordering of the equations within the matrix.

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Edberg, Alexandra. "Monitoring Kraft Recovery Boiler Fouling by Multivariate Data Analysis." Thesis, KTH, Skolan för kemi, bioteknologi och hälsa (CBH), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230906.

Повний текст джерела

Анотація:

This work deals with fouling in the recovery boiler at Montes del Plata, Uruguay. Multivariate data analysis has been used to analyze the large amount of data that was available in order to investigate how different parameters affect the fouling problems. Principal Component Analysis (PCA) and Partial Least Square Projection (PLS) have in this work been used. PCA has been used to compare average values between time periods with high and low fouling problems while PLS has been used to study the correlation structures between the variables and consequently give an indication of which parameters that might be changed to improve the availability of the boiler. The results show that this recovery boiler tends to have problems with fouling that might depend on the distribution of air, the black liquor pressure or the dry solid content of the black liquor. The results also show that multivariate data analysis is a powerful tool for analyzing these types of fouling problems.
Detta arbete handlar om inkruster i sodapannan pa Montes del Plata, Uruguay. Multivariat dataanalys har anvands for att analysera den stora datamangd som fanns tillganglig for att undersoka hur olika parametrar paverkar inkrusterproblemen. Principal·· Component Analysis (PCA) och Partial Least Square Projection (PLS) har i detta jobb anvants. PCA har anvants for att jamfora medelvarden mellan tidsperioder med hoga och laga inkrusterproblem medan PLS har anvants for att studera korrelationen mellan variablema och darmed ge en indikation pa vilka parametrar som kan tankas att andras for att forbattra tillgangligheten pa sodapannan. Resultaten visar att sodapannan tenderar att ha problem med inkruster som kan hero pa fdrdelningen av luft, pa svartlutens tryck eller pa torrhalten i svartluten. Resultaten visar ocksa att multivariat dataanalys ar ett anvandbart verktyg for att analysera dessa typer av inkrusterproblem.

Стилі APA, Harvard, Vancouver, ISO та ін.

29

Fiterau, Madalina. "Discovering Compact and Informative Structures through Data Partitioning." Research Showcase @ CMU, 2015. http://repository.cmu.edu/dissertations/792.

Повний текст джерела

Анотація:

In many practical scenarios, prediction for high-dimensional observations can be accurately performed using only a fraction of the existing features. However, the set of relevant predictive features, known as the sparsity pattern, varies across data. For instance, features that are informative for a subset of observations might be useless for the rest. In fact, in such cases, the dataset can be seen as an aggregation of samples belonging to several low-dimensional sub-models, potentially due to different generative processes. My thesis introduces several techniques for identifying sparse predictive structures and the areas of the feature space where these structures are effective. This information allows the training of models which perform better than those obtained through traditional feature selection. We formalize Informative Projection Recovery, the problem of extracting a set of low-dimensional projections of data which jointly form an accurate solution to a given learning task. Our solution to this problem is a regression-based algorithm that identifies informative projections by optimizing over a matrix of point-wise loss estimators. It generalizes to a number of machine learning problems, offering solutions to classification, clustering and regression tasks. Experiments show that our method can discover and leverage low-dimensional structure, yielding accurate and compact models. Our method is particularly useful in applications involving multivariate numeric data in which expert assessment of the results is of the essence. Additionally, we developed an active learning framework which works with the obtained compact models in finding unlabeled data deemed to be worth expert evaluation. For this purpose, we enhance standard active selection criteria using the information encapsulated by the trained model. The advantage of our approach is that the labeling effort is expended mainly on samples which benefit models from the hypothesis class we are considering. Additionally, the domain experts benefit from the availability of informative axis aligned projections at the time of labeling. Experiments show that this results in an improved learning rate over standard selection criteria, both for synthetic data and real-world data from the clinical domain, while the comprehensible view of the data supports the labeling process and helps preempt labeling errors.

Стилі APA, Harvard, Vancouver, ISO та ін.

30

Green, Patrick Corey. "Decision Support for Operational Plantation Forest Inventories through Auxiliary Information and Simulation." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/103054.

Повний текст джерела

Анотація:

Informed forest management requires accurate, up-to-date information. Ground-based forest inventory is commonly conducted to generate estimates of forest characteristics with a predetermined level of statistical confidence. As the importance of monitoring forest resources has increased, budgetary and logistical constraints often limit the resources needed for precise estimates. In this research, the incorporation of ancillary information in planted loblolly pine (Pinus taeda L.) forest inventory was investigated. Additionally, a simulation study using synthetic populations provided the basis for investigating the effects of plot and stand-level inventory aggregations on predictions and projections of future forest conditions. Forest regeneration surveys are important for assessing conditions immediately after plantation establishment. An unmanned aircraft system was evaluated for its ability to capture imagery that could be used to automate seedling counting using two computer vision approaches. The imagery was found to be unreliable for consistent detection in the conditions evaluated. Following establishment, conditions are assessed throughout the lifespan of forest plantations. Using small area estimation (SAE) methods, the incorporation of light detection and ranging (lidar) and thinning status improved the precision of inventory estimates compared with ground data alone. Further investigation found that reduced density lidar point clouds and lower resolution elevation models could be used to generate estimates with similar increases in precision. Individual tree detection estimates of stand density were found to provide minimal improvements in estimation precision when incorporated into the SAE models. Plot and stand level inventory aggregations were found to provide similar estimates of future conditions in simulated stands without high levels of spatial heterogeneity. Significant differences were noted when spatial heterogeneity was high. Model form was found to have a more significant effect on the observed differences than plot size or thinning status. The results of this research are of interest to forest managers who regularly conduct forest inventories and generate estimates of future stand conditions. The incorporation of auxiliary data in mid-rotation stands using SAE techniques improved estimate precision in most cases. Further, guidance on strategies for using this information for predicting future conditions is provided.
Doctor of Philosophy
Informed forest management requires accurate, up-to-date information. Groundbased sampling (inventory) is commonly used to generate estimates of forest characteristics such as total wood volume, stem density per unit area, heights, and regeneration survival. As the importance of assessing forest resources has increased, resources are often not available to conduct proper assessments. In this research, the incorporation of ancillary information in planted loblolly pine (Pinus taeda L.) forest inventory was investigated. Additionally, a simulation study investigated the effects of two forest inventory data aggregation methods on predictions and projections of future forest conditions. Forest regeneration surveys are important for assessing conditions immediately after tree planting. An unmanned aircraft system was evaluated for its ability to capture imagery that could be used to automate seedling counting. The imagery was found to be unreliable for use in accurately detecting seedlings in the conditions evaluated. Following establishment, forest conditions are assessed at additional points in forest development. Using a class of statistical estimators known as small-area estimation, a combination of ground and light detection and ranging data generated more confident estimates of forest conditions. Further investigation found that more coarse ancillary information can be used with similar confidence in the conditions evaluated. Forest inventory data are used to generate estimates of future conditions needed for management decisions. The final component of this research found that there are significant differences between two inventory data aggregation strategies when forest conditions are highly spatially variable. The results of this research are of interest to forest managers who regularly assess forest resources with inventories and models. The incorporation of ancillary information has potential to enhance forest resource assessments. Further, managers have guidance on strategies for using this information for estimating future conditions.

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Needham, Jessica. "Harnessing demographic data for cross-scale analysis of forest dynamics." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:156850fa-3148-45a6-b2f8-ada9dd3f6a7f.

Повний текст джерела

Анотація:

Forests are a critical biome but are under threat from unprecedented global change. The need to understand forest dynamics across spatial, temporal and biological scales has never been greater. Critical to this will be understanding how the demographic rates of individuals translate into patterns of species diversity, biomass and carbon turnover at much larger scales. In this thesis, I present a modelling framework focussed on demography. In Chapter 2, I introduce methods for translating forest inventory data into population models that account for the size-dependency of vital rates and persistent differences in individual performance. Outbreaks of forest pest and pathogens are increasing in frequency and severity, with consequences for biodiversity and forest structure. In Chapter 3, I explore the impact of ash dieback on the community dynamics of a British woodland, describing a spatially explicit individual based model that captures the effect of an opening of the canopy on local competitive interactions. Chapter 4 introduces methods to infer the impact of historical deer herbivory on the juvenile survival of forest trees. The approach is generalisable and could be applied to any forest in which patterns of regeneration and community structure have been impacted by periodic disturbance (e.g. forest fires). Finding meaningful ways of incorporating species diversity into global vegetation models is increasingly recognised as a research priority. In Chapter 5, I explore the diversity of demographic rates in a tropical forest community and identify groups of species with similar life history strategies. I discuss the potential of integrating demographic and physiological traits as a way to aggregate species for inclusion in global models. In summary, translating measurements of individuals into population dynamics provides opportunities to both explore small-scale community responses to disturbance events, and to feed into much larger scale vegetation models.

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Swinson, Michael D. "Statistical Modeling of High-Dimensional Nonlinear Systems: A Projection Pursuit Solution." Diss., Available online, Georgia Institute of Technology, 2005, 2005. http://etd.gatech.edu/theses/available/etd-11232005-204333/.

Повний текст джерела

Анотація:

Thesis (Ph. D.)--Mechanical Engineering, Georgia Institute of Technology, 2006.
Shapiro, Alexander, Committee Member ; Vidakovic, Brani, Committee Member ; Ume, Charles, Committee Member ; Sadegh, Nader, Committee Chair ; Liang, Steven, Committee Member. Vita.

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Spreyer, Kathrin. "Does it have to be trees? : Data-driven dependency parsing with incomplete and noisy training data." Phd thesis, Universität Potsdam, 2011. http://opus.kobv.de/ubp/volltexte/2012/5749/.

Повний текст джерела

Анотація:

We present a novel approach to training data-driven dependency parsers on incomplete annotations. Our parsers are simple modifications of two well-known dependency parsers, the transition-based Malt parser and the graph-based MST parser. While previous work on parsing with incomplete data has typically couched the task in frameworks of unsupervised or semi-supervised machine learning, we essentially treat it as a supervised problem. In particular, we propose what we call agnostic parsers which hide all fragmentation in the training data from their supervised components. We present experimental results with training data that was obtained by means of annotation projection. Annotation projection is a resource-lean technique which allows us to transfer annotations from one language to another within a parallel corpus. However, the output tends to be noisy and incomplete due to cross-lingual non-parallelism and error-prone word alignments. This makes the projected annotations a suitable test bed for our fragment parsers. Our results show that (i) dependency parsers trained on large amounts of projected annotations achieve higher accuracy than the direct projections, and that (ii) our agnostic fragment parsers perform roughly on a par with the original parsers which are trained only on strictly filtered, complete trees. Finally, (iii) when our fragment parsers are trained on artificially fragmented but otherwise gold standard dependencies, the performance loss is moderate even with up to 50% of all edges removed.
Wir präsentieren eine neuartige Herangehensweise an das Trainieren von daten-gesteuerten Dependenzparsern auf unvollständigen Annotationen. Unsere Parser sind einfache Varianten von zwei bekannten Dependenzparsern, nämlich des transitions-basierten Malt-Parsers sowie des graph-basierten MST-Parsers. Während frühere Arbeiten zum Parsing mit unvollständigen Daten die Aufgabe meist in Frameworks für unüberwachtes oder schwach überwachtes maschinelles Lernen gebettet haben, behandeln wir sie im Wesentlichen mit überwachten Lernverfahren. Insbesondere schlagen wir "agnostische" Parser vor, die jegliche Fragmentierung der Trainingsdaten vor ihren daten-gesteuerten Lernkomponenten verbergen. Wir stellen Versuchsergebnisse mit Trainingsdaten vor, die mithilfe von Annotationsprojektion gewonnen wurden. Annotationsprojektion ist ein Verfahren, das es uns erlaubt, innerhalb eines Parallelkorpus Annotationen von einer Sprache auf eine andere zu übertragen. Bedingt durch begrenzten crosslingualen Parallelismus und fehleranfällige Wortalinierung ist die Ausgabe des Projektionsschrittes jedoch üblicherweise verrauscht und unvollständig. Gerade dies macht projizierte Annotationen zu einer angemessenen Testumgebung für unsere fragment-fähigen Parser. Unsere Ergebnisse belegen, dass (i) Dependenzparser, die auf großen Mengen von projizierten Annotationen trainiert wurden, größere Genauigkeit erzielen als die zugrundeliegenden direkten Projektionen, und dass (ii) die Genauigkeit unserer agnostischen, fragment-fähigen Parser der Genauigkeit der Originalparser (trainiert auf streng gefilterten, komplett projizierten Bäumen) annähernd gleichgestellt ist. Schließlich zeigen wir mit künstlich fragmentierten Gold-Standard-Daten, dass (iii) der Verlust an Genauigkeit selbst dann bescheiden bleibt, wenn bis zu 50% aller Kanten in den Trainingsdaten fehlen.

Стилі APA, Harvard, Vancouver, ISO та ін.

34

Niskanen, M. (Matti). "A visual training based approach to surface inspection." Doctoral thesis, University of Oulu, 2003. http://urn.fi/urn:isbn:9514270673.

Повний текст джерела

Анотація:

Abstract Training a visual inspection device is not straightforward but suffers from the high variation in material to be inspected. This variation causes major difficulties for a human, and this is directly reflected in classifier training. Many inspection devices utilize rule-based classifiers the building and training of which rely mainly on human expertise. While designing such a classifier, a human tries to find the questions that would provide proper categorization. In training, an operator tunes the classifier parameters, aiming to achieve as good classification accuracy as possible. Such classifiers require lot of time and expertise before they can be fully utilized. Supervised classifiers form another common category. These learn automatically from training material, but rely on labels that a human has set for it. However, these labels tend to be inconsistent and thus reduce the classification accuracy achieved. Furthermore, as class boundaries are learnt from training samples, they cannot in practise be later adjusted if needed. In this thesis, a visual based training method is presented. It avoids the problems related to traditional training methods by combining a classifier and a user interface. The method relies on unsupervised projection and provides an intuitive way to directly set and tune the class boundaries of high-dimensional data. As the method groups the data only by the similarities of its features, it is not affected by erroneous and inconsistent labelling made for training samples. Furthermore, it does not require knowledge of the internal structure of the classifier or iterative parameter tuning, where a combination of parameter values leading to the desired class boundaries are sought. On the contrary, the class boundaries can be set directly, changing the classification parameters. The time need to take such a classifier into use is small and tuning the class boundaries can happen even on-line, if needed. The proposed method is tested with various experiments in this thesis. Different projection methods are evaluated from the point of view of visual based training. The method is further evaluated using a self-organizing map (SOM) as the projection method and wood as the test material. Parameters such as accuracy, map size, and speed are measured and discussed, and overall the method is found to be an advantageous training and classification scheme.

Стилі APA, Harvard, Vancouver, ISO та ін.

35

Patel, Rahul. "Maximum Likelihood – Expectation Maximum Reconstruction with Limited Dataset for Emission Tomography." Akron, OH : University of Akron, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=akron1175781554.

Повний текст джерела

Анотація:

Thesis (M.S.)--University of Akron, Dept. of Biomedical Engineering, 2007.
"May, 2007." Title from electronic thesis title page (viewed 04/26/2009) Advisor, Dale Mugler; Co-Advisor, Anthony Passalaqua; Committee member, Daniel Sheffer; Department Chair, Daniel Sheffer; Dean of the College, George K. Haritos; Dean of the Graduate School, George R. Newkome. Includes bibliographical references.

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Bergfors, Linus. "Explorative Multivariate Data Analysis of the Klinthagen Limestone Quarry Data." Thesis, Uppsala University, Department of Information Technology, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-122575.

Повний текст джерела

Анотація:

The today quarry planning at Klinthagen is rough, which provides an opportunity to introduce new exciting methods to improve the quarry gain and efficiency. Nordkalk AB, active at Klinthagen, wishes to start a new quarry at a nearby location. To exploit future quarries in an efficient manner and ensure production quality, multivariate statistics may help gather important information.

In this thesis the possibilities of the multivariate statistical approaches of Principal Component Analysis (PCA) and Partial Least Squares (PLS) regression were evaluated on the Klinthagen bore data. PCA data were spatially interpolated by Kriging, which also was evaluated and compared to IDW interpolation.

Principal component analysis supplied an overview of the variables relations, but also visualised the problems involved when linking geophysical data to geochemical data and the inaccuracy introduced by lacking data quality.

The PLS regression further emphasised the geochemical-geophysical problems, but also showed good precision when applied to strictly geochemical data.

Spatial interpolation by Kriging did not result in significantly better approximations than the less complex control interpolation by IDW.

In order to improve the information content of the data when modelled by PCA, a more discrete sampling method would be advisable. The data quality may cause trouble, though with sample technique of today it was considered to be of less consequence.

Faced with a single geophysical component to be predicted from chemical variables further geophysical data need to complement existing data to achieve satisfying PLS models.

The stratified rock composure caused trouble when spatially interpolated. Further investigations should be performed to develop more suitable interpolation techniques.

Стилі APA, Harvard, Vancouver, ISO та ін.

37

Cho, Jang Ik. "Partial EM Procedure for Big-Data Linear Mixed Effects Model, and Generalized PPE for High-Dimensional Data in Julia." Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case152845439167999.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

38

Carraher, Lee A. "Approximate Clustering Algorithms for High Dimensional Streaming and Distributed Data." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1511860805777818.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

39

Lin, Christie. "Linear regression analysis of 2D projection image data of 6 degrees-of-freedom transformed 3D image sets for stereotactic radiation therapy." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/76969.

Повний текст джерела

Анотація:

Thesis (S.M. and S.B.)--Massachusetts Institute of Technology, Dept. of Nuclear Science and Engineering, 2012.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 104-106).
Patient positioning is crucial to accurate dose delivery during radiation therapy to ensure the proper localization of dose to the target tumor volume. In patient positioning for stereotactic radiation therapy treatment, classical image registration methods are computationally costly and imprecise. We developed an automatic, fast, and robust 2D-3D registration method to improve accuracy and speed of identifying 6 degrees-of-freedom (DoF) transformations during patient positioning for stereotactic radiotherapy by creating a model of characteristic shape distributions to determine the linear relationship between two real-time orthogonal 2D projection images and the 3D volume image. We defined a preprocessed sparse base set of shape distributions that characterize 2D digitally reconstructed radiograph (DRR) images from a range of independent transformations of the volume. The algorithm calculates the 6-DoF transformation of the patient based upon two orthogonal real-time 2D images by correlating the images against the base set The algorithm has positioning accuracy to at least 1 pixel, equivalent to 0.5098 mm accuracy given this image resolution. The shape distribution of each 2D image is created in MATLAB in an average of 0.017 s. The online algorithm allows for rapid and accurate position matching of the images, providing the transformation needed to align the patient on average in 0.5276 s. The shape distribution algorithm affords speed, robustness, and accuracy of patient positioning during stereotactic radiotherapy treatment for small-order 6-DoF transformations as compared with existing techniques for the quantification of patient setup where both linear and rotational deviations occur. This algorithm also indicates the potential for rapid, high precision patient positioning from the interpolation and extrapolation of the linear relationships based upon shape distributions. Key words: shape distribution, image registration, patient positioning, radiation therapy
by Christie Lin.
S.M.and S.B.

Стилі APA, Harvard, Vancouver, ISO та ін.

40

RADEMACHER, ERIC W. "THE PATH TO ACCURATE PRE-ELECTION FORECASTS: AN ANALYSIS OF THE IMPACT OF DATA ADJUSTMENT TECHNIQUES ON PRE-ELECTION PROJECTION ESTIMATES." University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1021921989.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

41

Rademacher, Eric W. "The path to accurate pre-election forecasts an analysis of the impact of data adjustment techniques on pre-election projection estimates /." Cincinnati, Ohio : University of Cincinnati, 2002. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=ucin1021921989.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Blanchard, Pierre. "Fast hierarchical algorithms for the low-rank approximation of matrices, with applications to materials physics, geostatistics and data analysis." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0016/document.

Повний текст джерела

Анотація:

Les techniques avancées pour l’approximation de rang faible des matrices sont des outils de réduction de dimension fondamentaux pour un grand nombre de domaines du calcul scientifique. Les approches hiérarchiques comme les matrices H2, en particulier la méthode multipôle rapide (FMM), bénéficient de la structure de rang faible par bloc de certaines matrices pour réduire le coût de calcul de problèmes d’interactions à n-corps en O(n) opérations au lieu de O(n2). Afin de mieux traiter des noyaux d’interaction complexes de plusieurs natures, des formulations FMM dites ”kernel-independent” ont récemment vu le jour, telles que les FMM basées sur l’interpolation polynomiale. Cependant elles deviennent très coûteuses pour les noyaux tensoriels à fortes dimensions, c’est pourquoi nous avons développé une nouvelle formulation FMM efficace basée sur l’interpolation polynomiale, appelée Uniform FMM. Cette méthode a été implémentée dans la bibliothèque parallèle ScalFMM et repose sur une grille d’interpolation régulière et la transformée de Fourier rapide (FFT). Ses performances et sa précision ont été comparées à celles de la FMM par interpolation de Chebyshev. Des simulations numériques sur des cas tests artificiels ont montré que la perte de précision induite par le schéma d’interpolation était largement compensées par le gain de performance apporté par la FFT. Dans un premier temps, nous avons étendu les FMM basées sur grille de Chebyshev et sur grille régulière au calcul des champs élastiques isotropes mis en jeu dans des simulations de Dynamique des Dislocations (DD). Dans un second temps, nous avons utilisé notre nouvelle FMM pour accélérer une factorisation SVD de rang r par projection aléatoire et ainsi permettre de générer efficacement des champs Gaussiens aléatoires sur de grandes grilles hétérogènes. Pour finir, nous avons développé un algorithme de réduction de dimension basé sur la projection aléatoire dense afin d’étudier de nouvelles façons de caractériser la biodiversité, à savoir d’un point de vue géométrique
Advanced techniques for the low-rank approximation of matrices are crucial dimension reduction tools in many domains of modern scientific computing. Hierarchical approaches like H2-matrices, in particular the Fast Multipole Method (FMM), benefit from the block low-rank structure of certain matrices to reduce the cost of computing n-body problems to O(n) operations instead of O(n2). In order to better deal with kernels of various kinds, kernel independent FMM formulations have recently arisen such as polynomial interpolation based FMM. However, they are hardly tractable to high dimensional tensorial kernels, therefore we designed a new highly efficient interpolation based FMM, called the Uniform FMM, and implemented it in the parallel library ScalFMM. The method relies on an equispaced interpolation grid and the Fast Fourier Transform (FFT). Performance and accuracy were compared with the Chebyshev interpolation based FMM. Numerical experiments on artificial benchmarks showed that the loss of accuracy induced by the interpolation scheme was largely compensated by the FFT optimization. First of all, we extended both interpolation based FMM to the computation of the isotropic elastic fields involved in Dislocation Dynamics (DD) simulations. Second of all, we used our new FMM algorithm to accelerate a rank-r Randomized SVD and thus efficiently generate multivariate Gaussian random variables on large heterogeneous grids in O(n) operations. Finally, we designed a new efficient dimensionality reduction algorithm based on dense random projection in order to investigate new ways of characterizing the biodiversity, namely from a geometric point of view

Стилі APA, Harvard, Vancouver, ISO та ін.

43

Carvalho, Edigleison Francelino. "Probabilistic incremental learning for image recognition : modelling the density of high-dimensional data." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/90429.

Повний текст джерела

Анотація:

Atualmente diversos sistemas sensoriais fornecem dados em fluxos e essas observações medidas são frequentemente de alta dimensionalidade, ou seja, o número de variáveis medidas é grande, e as observações chegam em sequência. Este é, em particular, o caso de sistemas de visão em robôs. Aprendizagem supervisionada e não-supervisionada com esses fluxos de dados é um desafio, porque o algoritmo deve ser capaz de aprender com cada observação e depois descartá-la antes de considerar a próxima, mas diversos métodos requerem todo o conjunto de dados a fim de estimar seus parâmetros e, portanto, não são adequados para aprendizagem em tempo real. Além disso, muitas abordagens sofrem com a denominada maldição da dimensionalidade (BELLMAN, 1961) e não conseguem lidar com dados de entrada de alta dimensionalidade. Para superar os problemas descritos anteriormente, este trabalho propõe um novo modelo de rede neural probabilístico e incremental, denominado Local Projection Incremental Gaussian Mixture Network (LP-IGMN), que é capaz de realizar aprendizagem perpétua com dados de alta dimensionalidade, ou seja, ele pode aprender continuamente considerando a estabilidade dos parâmetros do modelo atual e automaticamente ajustar sua topologia levando em conta a fronteira do subespaço encontrado por cada neurônio oculto. O método proposto pode encontrar o subespaço intrísico onde os dados se localizam, o qual é denominado de subespaço principal. Ortogonal ao subespaço principal, existem as dimensões que são ruidosas ou que carregam pouca informação, ou seja, com pouca variância, e elas são descritas por um único parâmetro estimado. Portanto, LP-IGMN é robusta a diferentes fontes de dados e pode lidar com grande número de variáveis ruidosas e/ou irrelevantes nos dados medidos. Para avaliar a LP-IGMN nós realizamos diversos experimentos usando conjunto de dados simulados e reais. Demonstramos ainda diversas aplicações do nosso método em tarefas de reconhecimento de imagens. Os resultados mostraram que o desempenho da LP-IGMN é competitivo, e geralmente superior, com outras abordagens do estado da arte, e que ela pode ser utilizada com sucesso em aplicações que requerem aprendizagem perpétua em espaços de alta dimensionalidade.
Nowadays several sensory systems provide data in ows and these measured observations are frequently high-dimensional, i.e., the number of measured variables is large, and the observations are arriving in a sequence. This is in particular the case of robot vision systems. Unsupervised and supervised learning with such data streams is challenging, because the algorithm should be capable of learning from each observation and then discard it before considering the next one, but several methods require the whole dataset in order to estimate their parameters and, therefore, are not suitable for online learning. Furthermore, many approaches su er with the so called curse of dimensionality (BELLMAN, 1961) and can not handle high-dimensional input data. To overcome the problems described above, this work proposes a new probabilistic and incremental neural network model, called Local Projection Incremental Gaussian Mixture Network (LP-IGMN), which is capable to perform life-long learning with high-dimensional data, i.e., it can continuously learn considering the stability of the current model's parameters and automatically adjust its topology taking into account the subspace's boundary found by each hidden neuron. The proposed method can nd the intrinsic subspace where the data lie, which is called the principal subspace. Orthogonal to the principal subspace, there are the dimensions that are noisy or carry little information, i.e., with small variance, and they are described by a single estimated parameter. Therefore, LP-IGMN is robust to di erent sources of data and can deal with large number of noise and/or irrelevant variables in the measured data. To evaluate LP-IGMN we conducted several experiments using simulated and real datasets. We also demonstrated several applications of our method in image recognition tasks. The results have shown that the LP-IGMN performance is competitive, and usually superior, with other stateof- the-art approaches, and it can be successfully used in applications that require life-long learning in high-dimensional spaces.

Стилі APA, Harvard, Vancouver, ISO та ін.

44

Etemadpour, Ronak Verfasser], Lars [Akademischer Betreuer] Linsen, Bettina [Akademischer Betreuer] [Olk, Rosane [Akademischer Betreuer] Minghim, and Eric [Akademischer Betreuer] Monson. "Human Perception in Using Projection Methods for Multidimensional Data Visualization / Ronak Etemadpour. Betreuer: Lars Linsen. Gutachter: Lars Linsen ; Bettina Olk ; Rosane Minghim ; Eric Monson." Bremen : IRC-Library, Information Resource Center der Jacobs University Bremen, 2013. http://d-nb.info/1087274915/34.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

45

Hamilton, Lei Hou. "Reduced-data magnetic resonance imaging reconstruction methods: constraints and solutions." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42707.

Повний текст джерела

Анотація:

Imaging speed is very important in magnetic resonance imaging (MRI), especially in dynamic cardiac applications, which involve respiratory motion and heart motion. With the introduction of reduced-data MR imaging methods, increasing acquisition speed has become possible without requiring a higher gradient system. But these reduced-data imaging methods carry a price for higher imaging speed. This may be a signal-to-noise ratio (SNR) penalty, reduced resolution, or a combination of both. Many methods sacrifice edge information in favor of SNR gain, which is not preferable for applications which require accurate detection of myocardial boundaries. The central goal of this thesis is to develop novel reduced-data imaging methods to improve reconstructed image performance. This thesis presents a novel reduced-data imaging method, PINOT (Parallel Imaging and NOquist in Tandem), to accelerate MR imaging. As illustrated by a variety of computer simulated and real cardiac MRI data experiments, PINOT preserves the edge details, with flexibility of improving SNR by regularization. Another contribution is to exploit the data redundancy from parallel imaging, rFOV and partial Fourier methods. A Gerchberg Reduced Iterative System (GRIS), implemented with the Gerchberg-Papoulis (GP) iterative algorithm is introduced. Under the GRIS, which utilizes a temporal band-limitation constraint in the image reconstruction, a variant of Noquist called iterative implementation iNoquist (iterative Noquist) is proposed. Utilizing a different source of prior information, first combining iNoquist and Partial Fourier technique (phase-constrained iNoquist) and further integrating with parallel imaging methods (PINOT-GRIS) are presented to achieve additional acceleration gains.

Стилі APA, Harvard, Vancouver, ISO та ін.

46

Benmoussat, Mohammed Seghir. "Hyperspectral imagery algorithms for the processing of multimodal data : application for metal surface inspection in an industrial context by means of multispectral imagery, infrared thermography and stripe projection techniques." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4347/document.

Повний текст джерела

Анотація:

Le travail présenté dans cette thèse porte sur l'inspection de surfaces métalliques industrielles. Nous proposons de généraliser des méthodes de l'imagerie hyperspectrale à des données multimodales comme des images optiques multi-canales, et des images thermographiques multi-temporelles. Dans la première application, les cubes de données sont construits à partir d'images multi-composantes pour détecter des défauts de surface. Les meilleures performances sont obtenues avec les éclairages multi-longueurs d'ondes dans le visible et le proche IR, et la détection du défaut en utilisant l'angle spectral, avec le spectre moyen comme référence. La deuxième application concerne l'utilisation de l'imagerie thermique pour l'inspection de pièces métalliques nucléaires afin de détecter des défauts de surface et sub-surface. Une approche 1D est proposée, basée sur l'utilisation du kurtosis pour sélectionner la composante principale parmi les premières obtenues après réduction des données avec l’ACP. La méthode proposée donne de bonnes performances avec des données non-bruitées et homogènes, cependant la SVD avec les algorithmes de détection d'anomalies est très robuste aux perturbations. Finalement, une approche, basée sur les techniques d'analyse de franges et la lumière structurée est présentée, dans le but d'inspecter des surfaces métalliques à forme libre. Après avoir déterminé les paramètres décrivant les modèles de franges sinusoïdaux, l'approche proposée consiste à projeter une liste de motifs déphasés et à calculer l'image de phase des motifs enregistrés. La localisation des défauts est basée sur la détection et l'analyse des franges dans les images de phase
The work presented in this thesis deals with the quality control and inspection of industrial metallic surfaces. The purpose is the generalization and application of hyperspectral imagery methods for multimodal data such as multi-channel optical images and multi-temporal thermographic images. In the first application, data cubes are built from multi-component images to detect surface defects within flat metallic parts. The best performances are obtained with multi-wavelength illuminations in the visible and near infrared ranges, and detection using spectral angle mapper with mean spectrum as a reference. The second application turns on the use of thermography imaging for the inspection of nuclear metal components to detect surface and subsurface defects. A 1D approach is proposed based on using the kurtosis to select 1 principal component (PC) from the first PCs obtained after reducing the original data cube with the principal component analysis (PCA) algorithm. The proposed PCA-1PC method gives good performances with non-noisy and homogeneous data, and SVD with anomaly detection algorithms gives the most consistent results and is quite robust to perturbations such as inhomogeneous background. Finally, an approach based on fringe analysis and structured light techniques in case of deflectometric recordings is presented for the inspection of free-form metal surfaces. After determining the parameters describing the sinusoidal stripe patterns, the proposed approach consists in projecting a list of phase-shifted patterns and calculating the corresponding phase-images. Defect location is based on detecting and analyzing the stripes within the phase-images

Стилі APA, Harvard, Vancouver, ISO та ін.

47

Medvedev, Viktor. "Tiesioginio sklidimo neuroninių tinklų taikymo daugiamačiams duomenims vizualizuoti tyrimai." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2008. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2008~D_20080204_162347-54385.

Повний текст джерела

Анотація:

Disertacijos tyrimų sritis yra daugiamačių duomenų analizė, bei tų duomenų suvokimo gerinimo būdai. Duomenų suvokimas yra sudėtingas uždavinys, ypač kai duomenys nurodo sudėtingą objektą, kuris aprašytas daugeliu parametrų. Disertacijoje nagrinėjami dirbtinių neuroninių tinklų algoritmai daugiamačiams duomenims vizualizuoti. Darbo tyrimų objektas yra dirbtiniai neuroniniai tinklai, skirti daugiamačių duomenų vizualizavimui. Su šiuo objektu yra betarpiškai susiję dalykai: daugiamačių duomenų vizualizavimas; dimensijos mažinimo algoritmai; projekcijos paklaidos; naujų taškų atvaizdavimas; vizualizavimui skirto neuroninio tinklo permokymo strategijos ir parametrų optimizavimas; lygiagretieji skaičiavimai. Pagrindinis disertacijos tikslas yra sukurti ir tobulinti metodus, kuriuos taikant būtų efektyviai minimizuojamos daugiamačių duomenų projekcijos paklaidos naudojantis dirbtiniais neuroniniais tinklais bei projekcijos algoritmais. Darbe atliktų tyrimų rezultatai atskleidė naujas medicininių (fiziologinių) duomenų analizės galimybes.
The research area of this work is the analysis of multidimensional data and the ways of improving apprehension of the data. Data apprehension is rather a complicated problem especially if the data refer to a complex object or phenomenon described by many parameters. The research object of the dissertation is artificial neural networks for multidimensional data projection. General topics that are related with this object: multidimensional data visualization; dimensionality reduction algorithms; errors of projecting data; the projection of the new data; strategies for retraining the neural network that visualizes multidimensional data; optimization of control parameters of the neural network for multidimensional data projection; parallel computing. The key aim of the work is to develop and improve methods how to efficiently minimize visualization errors of multidimensional data by using artificial neural networks. The results of the research are applied in solving some problems in practice. Human physiological data that describe the human functional state have been investigated.

Стилі APA, Harvard, Vancouver, ISO та ін.

48

Goto, Daniela Bento Fonsechi. "Estimação de maxima verossimilhança para processo de nascimento puro espaço-temporal com dados parcialmente observados." [s.n.], 2008. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306192.

Повний текст джерела

Анотація:

Orientador: Nancy Lopes Garcia
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica
Made available in DSpace on 2018-08-11T16:45:43Z (GMT). No. of bitstreams: 1 Goto_DanielaBentoFonsechi_M.pdf: 3513260 bytes, checksum: ff6f9e35005ad9015007d1f51ee722c1 (MD5) Previous issue date: 2008
Resumo: O objetivo desta dissertação é estudar estimação de máxima verossimilhança para processos de nascimento puro espacial para dois diferentes tipos de amostragem: a) quando há observação permanente em um intervalo [0, T]; b) quando o processo é observado após um tempo T fixo. No caso b) não se conhece o tempo de nascimento dos pontos, somente sua localização (dados faltantes). A função de verossimilhança pode ser escrita para o processo de nascimento puro não homogêneo em um conjunto compacto através do método da projeção descrito por Garcia and Kurtz (2008), como projeção da função de verossimilhança. A verossimilhança projetada pode ser interpretada como uma esperança e métodos de Monte Carlo podem ser utilizados para estimar os parâmetros. Resultados sobre convergência quase-certa e em distribuição são obtidos para a aproximação do estimador de máxima verossimilhança. Estudos de simulação mostram que as aproximações são adequadas.
Abstract: The goal of this work is to study the maximum likelihood estimation of a spatial pure birth process under two different sampling schemes: a) permanent observation in a fixed time interval [0, T]; b) observation of the process only after a fixed time T. Under scheme b) we don't know the birth times, we have a problem of missing variables. We can write the likelihood function for the nonhomogeneous pure birth process on a compact set through the method of projection described by Garcia and Kurtz (2008), as the projection of the likelihood function. The fact that the projected likelihood can be interpreted as an expectation suggests that Monte Carlo methods can be used to compute estimators. Results of convergence almost surely and in distribution are obtained for the aproximants to the maximum likelihood estimator. Simulation studies show that the approximants are appropriate.
Mestrado
Inferencia em Processos Estocasticos
Mestre em Estatística

Стилі APA, Harvard, Vancouver, ISO та ін.

49

Pagliosa, Lucas de Carvalho. "Visualização e exploração de dados multidimensionais na web." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-08042016-103144/.

Повний текст джерела

Анотація:

Com o crescimento do volume e dos tipos de dados, a necessidade de analisar e entender o que estes representam e como estão relacionados tem se tornado crucial. Técnicas de visualização baseadas em projeções multidimensionais ganharam espaço e interesse como uma das possíveis ferramentas de auxílio para esse problema, proporcionando um forma simples e rápida de identificar padrões, reconhecer tendências e extrair características antes não óbvias no conjunto original. No entanto, a projeção do conjunto de dados em um espaço de menor dimensão pode não ser suficiente, em alguns casos, para responder ou esclarecer certas perguntas feitas pelo usuário, tornando a análise posterior à projeção crucial para a correta interpretação da visualização observada. Logo, a interatividade, aplicada à necessidade do usuário, é uma fator essencial para análise. Neste contexto, este projeto de mestrado tem como principal objetivo criar metáforas visuais baseadas em atributos, através de medidas estatísticas e artefatos para detecção de ruídos e grupos similares, para auxiliar na exploração e análise dos dados projetados. Além disso, propõe-se disponibilizar, em navegadores Web, as técnicas de visualização de dados multidimensionais desenvolvidas pelo Grupo de Processamento Visual e Geométrico do ICMC-USP. O desenvolvimento do projeto como plataforma Web inspira-se na dificuldade de instalação e execução que certos projetos de visualização possuem, como problemas causados por diferentes versões de IDEs, compiladores e sistemas operacionais. Além disso, o fato do projeto estar disponível online para execução tem como propósito facilitar o acesso e a divulgação das técnicas propostas para o público geral.
With the growing number and types of data, the need to analyze and understand what they represent and how they are related has become crucial. Visualization techniques based on multidimensional projections have gained space and interest as one of the possible tools to aid this problem, providing a simple and quick way to identify patterns, recognize trends and extract features previously not obvious in the original set. However, the data set projection in a smaller space may not be sufficient in some cases to answer or clarify certain questions asked by the user, making the posterior projection analysis crucial for the exploration and understanding of the data. Thus, interactivity in the visualization, applied to the users needs, is an essential factor for analysis. In this context, this master projects main objective consists to create visual metaphors based on attributes, through statistical measures and artifacts for detecting noise and similar groups, to assist the exploration and analysis of projected data. In addition, it is proposed to make available, in Web browsers, the multidimensional data visualization techniques developed by the Group of Visual and Geometric Processing at ICMC-USP. The development of the project as a Web platform was inspired by the difficulty of installation and running that certain visualization projects have, mainly due different versions of IDEs, compilers and operating systems. In addition, the fact that the project is available online for execution aims to facilitate the access and dissemination of technical proposals for the general public.

Стилі APA, Harvard, Vancouver, ISO та ін.

50

Vitale, Raffaele. "Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/90442.

Повний текст джерела

Анотація:

The present Ph.D. thesis, primarily conceived to support and reinforce the relation between academic and industrial worlds, was developed in collaboration with Shell Global Solutions (Amsterdam, The Netherlands) in the endeavour of applying and possibly extending well-established latent variable-based approaches (i.e. Principal Component Analysis - PCA - Partial Least Squares regression - PLS - or Partial Least Squares Discriminant Analysis - PLSDA) for complex problem solving not only in the fields of manufacturing troubleshooting and optimisation, but also in the wider environment of multivariate data analysis. To this end, novel efficient algorithmic solutions are proposed throughout all chapters to address very disparate tasks, from calibration transfer in spectroscopy to real-time modelling of streaming flows of data. The manuscript is divided into the following six parts, focused on various topics of interest: Part I - Preface, where an overview of this research work, its main aims and justification is given together with a brief introduction on PCA, PLS and PLSDA; Part II - On kernel-based extensions of PCA, PLS and PLSDA, where the potential of kernel techniques, possibly coupled to specific variants of the recently rediscovered pseudo-sample projection, formulated by the English statistician John C. Gower, is explored and their performance compared to that of more classical methodologies in four different applications scenarios: segmentation of Red-Green-Blue (RGB) images, discrimination of on-/off-specification batch runs, monitoring of batch processes and analysis of mixture designs of experiments; Part III - On the selection of the number of factors in PCA by permutation testing, where an extensive guideline on how to accomplish the selection of PCA components by permutation testing is provided through the comprehensive illustration of an original algorithmic procedure implemented for such a purpose; Part IV - On modelling common and distinctive sources of variability in multi-set data analysis, where several practical aspects of two-block common and distinctive component analysis (carried out by methods like Simultaneous Component Analysis - SCA - DIStinctive and COmmon Simultaneous Component Analysis - DISCO-SCA - Adapted Generalised Singular Value Decomposition - Adapted GSVD - ECO-POWER, Canonical Correlation Analysis - CCA - and 2-block Orthogonal Projections to Latent Structures - O2PLS) are discussed, a new computational strategy for determining the number of common factors underlying two data matrices sharing the same row- or column-dimension is described, and two innovative approaches for calibration transfer between near-infrared spectrometers are presented; Part V - On the on-the-fly processing and modelling of continuous high-dimensional data streams, where a novel software system for rational handling of multi-channel measurements recorded in real time, the On-The-Fly Processing (OTFP) tool, is designed; Part VI - Epilogue, where final conclusions are drawn, future perspectives are delineated, and annexes are included.
La presente tesis doctoral, concebida principalmente para apoyar y reforzar la relación entre la academia y la industria, se desarrolló en colaboración con Shell Global Solutions (Amsterdam, Países Bajos) en el esfuerzo de aplicar y posiblemente extender los enfoques ya consolidados basados en variables latentes (es decir, Análisis de Componentes Principales - PCA - Regresión en Mínimos Cuadrados Parciales - PLS - o PLS discriminante - PLSDA) para la resolución de problemas complejos no sólo en los campos de mejora y optimización de procesos, sino también en el entorno más amplio del análisis de datos multivariados. Con este fin, en todos los capítulos proponemos nuevas soluciones algorítmicas eficientes para abordar tareas dispares, desde la transferencia de calibración en espectroscopia hasta el modelado en tiempo real de flujos de datos. El manuscrito se divide en las seis partes siguientes, centradas en diversos temas de interés: Parte I - Prefacio, donde presentamos un resumen de este trabajo de investigación, damos sus principales objetivos y justificaciones junto con una breve introducción sobre PCA, PLS y PLSDA; Parte II - Sobre las extensiones basadas en kernels de PCA, PLS y PLSDA, donde presentamos el potencial de las técnicas de kernel, eventualmente acopladas a variantes específicas de la recién redescubierta proyección de pseudo-muestras, formulada por el estadista inglés John C. Gower, y comparamos su rendimiento respecto a metodologías más clásicas en cuatro aplicaciones a escenarios diferentes: segmentación de imágenes Rojo-Verde-Azul (RGB), discriminación y monitorización de procesos por lotes y análisis de diseños de experimentos de mezclas; Parte III - Sobre la selección del número de factores en el PCA por pruebas de permutación, donde aportamos una guía extensa sobre cómo conseguir la selección de componentes de PCA mediante pruebas de permutación y una ilustración completa de un procedimiento algorítmico original implementado para tal fin; Parte IV - Sobre la modelización de fuentes de variabilidad común y distintiva en el análisis de datos multi-conjunto, donde discutimos varios aspectos prácticos del análisis de componentes comunes y distintivos de dos bloques de datos (realizado por métodos como el Análisis Simultáneo de Componentes - SCA - Análisis Simultáneo de Componentes Distintivos y Comunes - DISCO-SCA - Descomposición Adaptada Generalizada de Valores Singulares - Adapted GSVD - ECO-POWER, Análisis de Correlaciones Canónicas - CCA - y Proyecciones Ortogonales de 2 conjuntos a Estructuras Latentes - O2PLS). Presentamos a su vez una nueva estrategia computacional para determinar el número de factores comunes subyacentes a dos matrices de datos que comparten la misma dimensión de fila o columna y dos planteamientos novedosos para la transferencia de calibración entre espectrómetros de infrarrojo cercano; Parte V - Sobre el procesamiento y la modelización en tiempo real de flujos de datos de alta dimensión, donde diseñamos la herramienta de Procesamiento en Tiempo Real (OTFP), un nuevo sistema de manejo racional de mediciones multi-canal registradas en tiempo real; Parte VI - Epílogo, donde presentamos las conclusiones finales, delimitamos las perspectivas futuras, e incluimos los anexos.
La present tesi doctoral, concebuda principalment per a recolzar i reforçar la relació entre l'acadèmia i la indústria, es va desenvolupar en col·laboració amb Shell Global Solutions (Amsterdam, Països Baixos) amb l'esforç d'aplicar i possiblement estendre els enfocaments ja consolidats basats en variables latents (és a dir, Anàlisi de Components Principals - PCA - Regressió en Mínims Quadrats Parcials - PLS - o PLS discriminant - PLSDA) per a la resolució de problemes complexos no solament en els camps de la millora i optimització de processos, sinó també en l'entorn més ampli de l'anàlisi de dades multivariades. A aquest efecte, en tots els capítols proposem noves solucions algorítmiques eficients per a abordar tasques dispars, des de la transferència de calibratge en espectroscopia fins al modelatge en temps real de fluxos de dades. El manuscrit es divideix en les sis parts següents, centrades en diversos temes d'interès: Part I - Prefaci, on presentem un resum d'aquest treball de recerca, es donen els seus principals objectius i justificacions juntament amb una breu introducció sobre PCA, PLS i PLSDA; Part II - Sobre les extensions basades en kernels de PCA, PLS i PLSDA, on presentem el potencial de les tècniques de kernel, eventualment acoblades a variants específiques de la recentment redescoberta projecció de pseudo-mostres, formulada per l'estadista anglés John C. Gower, i comparem el seu rendiment respecte a metodologies més clàssiques en quatre aplicacions a escenaris diferents: segmentació d'imatges Roig-Verd-Blau (RGB), discriminació i monitorització de processos per lots i anàlisi de dissenys d'experiments de mescles; Part III - Sobre la selecció del nombre de factors en el PCA per proves de permutació, on aportem una guia extensa sobre com aconseguir la selecció de components de PCA a través de proves de permutació i una il·lustració completa d'un procediment algorítmic original implementat per a la finalitat esmentada; Part IV - Sobre la modelització de fonts de variabilitat comuna i distintiva en l'anàlisi de dades multi-conjunt, on discutim diversos aspectes pràctics de l'anàlisis de components comuns i distintius de dos blocs de dades (realitzat per mètodes com l'Anàlisi Simultània de Components - SCA - Anàlisi Simultània de Components Distintius i Comuns - DISCO-SCA - Descomposició Adaptada Generalitzada en Valors Singulars - Adapted GSVD - ECO-POWER, Anàlisi de Correlacions Canòniques - CCA - i Projeccions Ortogonals de 2 blocs a Estructures Latents - O2PLS). Presentem al mateix temps una nova estratègia computacional per a determinar el nombre de factors comuns subjacents a dues matrius de dades que comparteixen la mateixa dimensió de fila o columna, i dos plantejaments nous per a la transferència de calibratge entre espectròmetres d'infraroig proper; Part V - Sobre el processament i la modelització en temps real de fluxos de dades d'alta dimensió, on dissenyem l'eina de Processament en Temps Real (OTFP), un nou sistema de tractament racional de mesures multi-canal registrades en temps real; Part VI - Epíleg, on presentem les conclusions finals, delimitem les perspectives futures, i incloem annexos.
Vitale, R. (2017). Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90442
TESIS

Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Data projection"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями