Dissertations / Theses on the topic 'Dimensionality reduction'

To see the other types of publications on this topic, follow the link: Dimensionality reduction.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Dimensionality reduction.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Ariu, Kaito. "Online Dimensionality Reduction." Licentiate thesis, KTH, Reglerteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290791.

Full text
Abstract:
In this thesis, we investigate online dimensionality reduction methods, wherethe algorithms learn by sequentially acquiring data. We focus on two specificalgorithm design problems in (i) recommender systems and (ii) heterogeneousclustering from binary user feedback. (i) For recommender systems, we consider a system consisting of m users and n items. In each round, a user,selected uniformly at random, arrives to the system and requests a recommendation. The algorithm observes the user id and recommends an itemfrom the item set. A notable restriction here is that the same item cannotbe recommended to the same user more than once, a constraint referred toas a no-repetition constraint. We study this problem as a variant of themulti-armed bandit problem and analyze regret with the various structurespertaining to items and users. We devise fundamental limits of regret andalgorithms that can achieve the limits order-wise. The analysis explicitlyhighlights the importance of each component of regret. For example, we candistinguish the regret due to the no-repetition constraint, that generated tolearn the statistics of user’s preference for an item, and that generated tolearn the low-dimensional space of the users and items were shown. (ii) Inthe clustering with binary feedback problem, the objective is to classify itemssolely based on limited user feedback. More precisely, users are just askedsimple questions with binary answers. A notable difficulty stems from theheterogeneity in the difficulty in classifying the various items (some itemsrequire more feedback to be classified than others). For this problem, wederive fundamental limits of the cluster recovery rates for both offline andonline algorithms. For the offline setting, we devise a simple algorithm thatachieves the limit order-wise. For the online setting, we propose an algorithm inspired by the lower bound. For both of the problems, we evaluatethe proposed algorithms by inspecting their theoretical guarantees and usingnumerical experiments performed on the synthetic and non-synthetic dataset.
Denna avhandling studerar algoritmer för datareduktion som lär sig från sekventiellt inhämtad data. Vi fokuserar speciellt på frågeställningar som uppkommer i utvecklingen av rekommendationssystem och i identifieringen av heterogena grupper av användare från data. För rekommendationssystem betraktar vi ett system med m användare och n objekt. I varje runda observerar algoritmen en slumpmässigt vald användare och rekommenderar ett objekt. En viktig begränsning i vår problemformuleringar att rekommendationer inte får upprepas: samma objekt inte kan rekommenderas till samma användartermer än en gång. Vi betraktar problemet som en variant av det flerarmadebanditproblemet och analyserar systemprestanda i termer av "ånger” under olika antaganden.Vi härleder fundamentala gränser för ånger och föreslår algoritmer som är (ordningsmässigt) optimala. En intressant komponent av vår analys är att vi lyckas att karaktärisera hur vart och ett av våra antaganden påverkar systemprestandan. T.ex. kan vi kvantifiera prestandaförlusten i ånger på grund av att rekommendationer inte får upprepas, på grund avatt vi måste lära oss statistiken för vilka objekt en användare är intresserade av, och för kostnaden för att lära sig den lågdimensionella rymden för användare och objekt. För problemet med hur man bäst identifierar grupper av användare härleder vi fundamentala gränser för hur snabbt det går att identifiera kluster. Vi gör detta för algoritmer som har samtidig tillgång till all data och för algoritmer som måste lära sig genom sekventiell inhämtning av data. Med tillgång till all data kan vår algoritm uppnå den optimala prestandan ordningsmässigt. När data måste inhämtas sekventiellt föreslår vi en algoritm som är inspirerad av den nedre gränsen på vad som kan uppnås. För båda problemen utvärderar vi de föreslagna algoritmerna numeriskt och jämför den praktiska prestandan med de teoretiska garantierna.

QC 20210223

APA, Harvard, Vancouver, ISO, and other styles
2

LEGRAMANTI, SIRIO. "Bayesian dimensionality reduction." Doctoral thesis, Università Bocconi, 2021. http://hdl.handle.net/11565/4035711.

Full text
Abstract:
No abstract available
We are currently witnessing an explosion in the amount of available data. Such growth involves not only the number of data points but also their dimensionality. This poses new challenges to statistical modeling and computations, thus making dimensionality reduction more central than ever. In the present thesis, we provide methodological, computational and theoretical advancements in Bayesian dimensionality reduction via novel structured priors. Namely, we develop a new increasing shrinkage prior and illustrate how it can be employed to discard redundant dimensions in Gaussian factor models. In order to make it usable for larger datasets, we also investigate variational methods for posterior inference under this proposed prior. Beyond traditional models and parameter spaces, we also provide a different take on dimensionality reduction, focusing on community detection in networks. For this purpose, we define a general class of Bayesian nonparametric priors that encompasses existing stochastic block models as special cases and includes promising unexplored options. Our Bayesian approach allows for a natural incorporation of node attributes and facilitates uncertainty quantification as well as model selection.
APA, Harvard, Vancouver, ISO, and other styles
3

Baldiwala, Aliakbar. "Dimensionality Reduction for Commercial Vehicle Fleet Monitoring." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/38330.

Full text
Abstract:
A variety of new features have been added in the present-day vehicles like a pre-crash warning, the vehicle to vehicle communication, semi-autonomous driving systems, telematics, drive by wire. They demand very high bandwidth from in-vehicle networks. Various electronic control units present inside the automotive transmit useful information via automotive multiplexing. Automotive multiplexing allows sharing information among various intelligent modules inside an automotive electronic system. Optimum functionality is achieved by transmitting this data in real time. The high bandwidth and high-speed requirement can be achieved either by using multiple buses or by implementing higher bandwidth. But, by doing so the cost of the network and the complexity of the wiring in the vehicle increases. Another option is to implement higher layer protocol which can reduce the amount of data transferred by using data reduction (DR) techniques, thus reducing the bandwidth usage. The implementation cost is minimal as only the changes are required in the software and not in hardware. In our work, we present a new data reduction algorithm termed as “Comprehensive Data Reduction (CDR)” algorithm. The proposed algorithm is used for minimization of the bus utilization of CAN bus for a future vehicle. The reduction in the busload was efficiently made by compressing the parameters; thus, more number of messages and lower priority messages can be efficiently sent on the CAN bus. The proposed work also presents a performance analysis of proposed algorithm with the boundary of fifteen compression algorithm, and Compression area selection algorithms (Existing Data Reduction Algorithm). The results of the analysis show that proposed CDR algorithm provides better data reduction compared to earlier proposed algorithms. The promising results were obtained in terms of reduction in bus utilization, compression efficiency, and percent peak load of CAN bus. This Reduction in the bus utilization permits to utilize a larger number of network nodes (ECU’s) in the existing system without increasing the overall cost of the system. The proposed algorithm has been developed for automotive environment, but it can also be utilized in any applications where extensive information transmission among various control units is carried out via a multiplexing bus.
APA, Harvard, Vancouver, ISO, and other styles
4

Bolelli, Maria Virginia. "Diffusion Maps for Dimensionality Reduction." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/18246/.

Full text
Abstract:
In this thesis we present the diffusion maps, a framework based on diffusion processes for finding meaningful geometric descriptions of data sets. A diffusion process can be described via an iterative application of the heat kernel which has two main characteristics: it satisfies a Markov semigroup property and its level sets encode all geometric features of the space. This process, well known in regular manifolds, has been extended to general data set by Coifman and Lafon. They define a diffusion kernel starting from the geometric properties of the data and their density properties. This kernel will be a compact operator, and the projection on its eigenvectors at different instant of time, provides a family of embeddings of a dataset into a suitable Euclidean space. The projection on the first eigenvectors, naturally leads to a dimensionality reduction algorithm. Numerical implementation is provided on different data set.
APA, Harvard, Vancouver, ISO, and other styles
5

Khosla, Nitin, and n/a. "Dimensionality Reduction Using Factor Analysis." Griffith University. School of Engineering, 2006. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20061010.151217.

Full text
Abstract:
In many pattern recognition applications, a large number of features are extracted in order to ensure an accurate classification of unknown classes. One way to solve the problems of high dimensions is to first reduce the dimensionality of the data to a manageable size, keeping as much of the original information as possible and then feed the reduced-dimensional data into a pattern recognition system. In this situation, dimensionality reduction process becomes the pre-processing stage of the pattern recognition system. In addition to this, probablility density estimation, with fewer variables is a simpler approach for dimensionality reduction. Dimensionality reduction is useful in speech recognition, data compression, visualization and exploratory data analysis. Some of the techniques which can be used for dimensionality reduction are; Factor Analysis (FA), Principal Component Analysis(PCA), and Linear Discriminant Analysis(LDA). Factor Analysis can be considered as an extension of Principal Component Analysis. The EM (expectation maximization) algorithm is ideally suited to problems of this sort, in that it produces maximum-likelihood (ML) estimates of parameters when there is a many-to-one mapping from an underlying distribution to the distribution governing the observation, conditioned upon the obervations. The maximization step then provides a new estimate of the parameters. This research work compares the techniques; Factor Analysis (Expectation-Maximization algorithm based), Principal Component Analysis and Linear Discriminant Analysis for dimensionality reduction and investigates Local Factor Analysis (EM algorithm based) and Local Principal Component Analysis using Vector Quantization.
APA, Harvard, Vancouver, ISO, and other styles
6

Vamulapalli, Harika Rao. "On Dimensionality Reduction of Data." ScholarWorks@UNO, 2010. http://scholarworks.uno.edu/td/1211.

Full text
Abstract:
Random projection method is one of the important tools for the dimensionality reduction of data which can be made efficient with strong error guarantees. In this thesis, we focus on linear transforms of high dimensional data to the low dimensional space satisfying the Johnson-Lindenstrauss lemma. In addition, we also prove some theoretical results relating to the projections that are of interest when applying them in practical applications. We show how the technique can be applied to synthetic data with probabilistic guarantee on the pairwise distance. The connection between dimensionality reduction and compressed sensing is also discussed.
APA, Harvard, Vancouver, ISO, and other styles
7

Widemann, David P. "Dimensionality reduction for hyperspectral data." College Park, Md.: University of Maryland, 2008. http://hdl.handle.net/1903/8448.

Full text
Abstract:
Thesis (Ph. D.) -- University of Maryland, College Park, 2008.
Thesis research directed by: Dept. of Mathematics. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
APA, Harvard, Vancouver, ISO, and other styles
8

Khosla, Nitin. "Dimensionality Reduction Using Factor Analysis." Thesis, Griffith University, 2006. http://hdl.handle.net/10072/366058.

Full text
Abstract:
In many pattern recognition applications, a large number of features are extracted in order to ensure an accurate classification of unknown classes. One way to solve the problems of high dimensions is to first reduce the dimensionality of the data to a manageable size, keeping as much of the original information as possible and then feed the reduced-dimensional data into a pattern recognition system. In this situation, dimensionality reduction process becomes the pre-processing stage of the pattern recognition system. In addition to this, probablility density estimation, with fewer variables is a simpler approach for dimensionality reduction. Dimensionality reduction is useful in speech recognition, data compression, visualization and exploratory data analysis. Some of the techniques which can be used for dimensionality reduction are; Factor Analysis (FA), Principal Component Analysis(PCA), and Linear Discriminant Analysis(LDA). Factor Analysis can be considered as an extension of Principal Component Analysis. The EM (expectation maximization) algorithm is ideally suited to problems of this sort, in that it produces maximum-likelihood (ML) estimates of parameters when there is a many-to-one mapping from an underlying distribution to the distribution governing the observation, conditioned upon the obervations. The maximization step then provides a new estimate of the parameters. This research work compares the techniques; Factor Analysis (Expectation-Maximization algorithm based), Principal Component Analysis and Linear Discriminant Analysis for dimensionality reduction and investigates Local Factor Analysis (EM algorithm based) and Local Principal Component Analysis using Vector Quantization.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Engineering
Full Text
APA, Harvard, Vancouver, ISO, and other styles
9

Sætrom, Jon. "Reduction of Dimensionality in Spatiotemporal Models." Doctoral thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for matematiske fag, 2010. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-11247.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ghodsi, Boushehri Ali. "Nonlinear Dimensionality Reduction with Side Information." Thesis, University of Waterloo, 2006. http://hdl.handle.net/10012/1020.

Full text
Abstract:
In this thesis, I look at three problems with important applications in data processing. Incorporating side information, provided by the user or derived from data, is a main theme of each of these problems.

This thesis makes a number of contributions. The first is a technique for combining different embedding objectives, which is then exploited to incorporate side information expressed in terms of transformation invariants known to hold in the data. It also introduces two different ways of incorporating transformation invariants in order to make new similarity measures. Two algorithms are proposed which learn metrics based on different types of side information. These learned metrics can then be used in subsequent embedding methods. Finally, it introduces a manifold learning algorithm that is useful when applied to sequential decision problems. In this case we are given action labels in addition to data points. Actions in the manifold learned by this algorithm have meaningful representations in that they are represented as simple transformations.
APA, Harvard, Vancouver, ISO, and other styles
11

Merola, Giovanni Maria. "Dimensionality reduction methods in multivariate prediction." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape15/PQDD_0022/NQ32847.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Musco, Cameron N. (Cameron Nicholas). "Dimensionality reduction for k-means clustering." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101473.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 123-131).
In this thesis we study dimensionality reduction techniques for approximate k-means clustering. Given a large dataset, we consider how to quickly compress to a smaller dataset (a sketch), such that solving the k-means clustering problem on the sketch will give an approximately optimal solution on the original dataset. First, we provide an exposition of technical results of [CEM+15], which show that provably accurate dimensionality reduction is possible using common techniques such as principal component analysis, random projection, and random sampling. We next present empirical evaluations of dimensionality reduction techniques to supplement our theoretical results. We show that our dimensionality reduction algorithms, along with heuristics based on these algorithms, indeed perform well in practice. Finally, we discuss possible extensions of our work to neurally plausible algorithms for clustering and dimensionality reduction. This thesis is based on joint work with Michael Cohen, Samuel Elder, Nancy Lynch, Christopher Musco, and Madalina Persu.
by Cameron N. Musco.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
13

Law, Hiu Chung. "Clustering, dimensionality reduction, and side information." Diss., Connect to online resource - MSU authorized users, 2006.

Find full text
Abstract:
Thesis (Ph. D.)--Michigan State University. Dept. of Computer Science & Engineering, 2006.
Title from PDF t.p. (viewed on June 19, 2009) Includes bibliographical references (p. 296-317). Also issued in print.
APA, Harvard, Vancouver, ISO, and other styles
14

Vasiloglou, Nikolaos. "Isometry and convexity in dimensionality reduction." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/28120.

Full text
Abstract:
Thesis (M. S.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009.
Committee Chair: David Anderson; Committee Co-Chair: Alexander Gray; Committee Member: Anthony Yezzi; Committee Member: Hongyuan Zha; Committee Member: Justin Romberg; Committee Member: Ronald Schafer.
APA, Harvard, Vancouver, ISO, and other styles
15

Gagliardi, Alessandro <1990&gt. "Dimensionality reduction methods for paleoclimate reconstructions." Master's Degree Thesis, Università Ca' Foscari Venezia, 2017. http://hdl.handle.net/10579/10434.

Full text
Abstract:
Paleoclimatology seeks to understand past changes in climate occurred before the instrumental period through paleoclimate archives. These archives consist of natural materials that keep trace of climate changes with different time scales and resolutions. Tree-ring archives are able to provide a timescale of thousands of years with annual resolution. This thesis discusses reconstruction of the past temperature in the period ranging from year 1400 until 1849 on the basis of the information available in a tree-ring dataset consisting of 70 trees located in the United States of America. The temperature data used for calibration and validation come from the HadCRUT4 dataset. The thesis considers past temperature reconstructions based on multiple linear regression models calibrated with instrumental temperature available for the period 1902-1980. Since the number of tree-ring proxies is large compared with the number of observations, standard multiple linear regression is unsuitable thus making necessary to apply dimensionality reduction methods such as principal component regression and partial least squares regression. The methodology developed in the thesis includes corrections to handle for residual serial dependence. The thesis results indicate that (i) key events of the climate forcings are well identified in the reconstructions based on both partial least squares and principal component regression but (ii) the method of partial least squares regression is superior in terms of precision of past temperature predictions.
APA, Harvard, Vancouver, ISO, and other styles
16

Musco, Christopher Paul. "Dimensionality reduction for sparse and structured matrices." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/99856.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 97-103).
Dimensionality reduction has become a critical tool for quickly solving massive matrix problems. Especially in modern data analysis and machine learning applications, an overabundance of data features or examples can make it impossible to apply standard algorithms efficiently. To address this issue, it is often possible to distill data to a much smaller set of informative features or examples, which can be used to obtain provably accurate approximate solutions to a variety of problems In this thesis, we focus on the important case of dimensionality reduction for sparse and structured data. In contrast to popular structure-agnostic methods like Johnson-Lindenstrauss projection and PCA, we seek data compression techniques that take advantage of structure to generate smaller or more powerful compressions. Additionally, we aim for methods that can be applied extremely quickly - typically in linear or nearly linear time in the input size. Specifically, we introduce new randomized algorithms for structured dimensionality reduction that are based on importance sampling and sparse-recovery techniques. Our work applies directly to accelerating linear regression and graph sparsification and we discuss connections and possible extensions to low-rank approximation, k-means clustering, and several other ubiquitous matrix problems.
by Christopher Paul Musco.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
17

Beach, David J. "Anomaly Detection with Advanced Nonlinear Dimensionality Reduction." Digital WPI, 2020. https://digitalcommons.wpi.edu/etd-theses/1378.

Full text
Abstract:
Dimensionality reduction techniques such as t-SNE and UMAP are useful both for overview of high-dimensional datasets and as part of a machine learning pipeline. These techniques create a non-parametric model of the manifold by fitting a density kernel about each data point using the distances to its k-nearest neighbors. In dense regions, this approach works well, but in sparse regions, it tends to draw unrelated points into the nearest cluster. Our work focuses on a homotopy method which imposes graph-based regularization over the manifold parameters to update the embedding. As the homotopy parameter increases, so does the cost of modeling different scales between adjacent neighborhoods. This gradually imposes a more uniform scale over the manifold, resulting in a more faithful embedding which preserves structure in dense areas while pushing sparse anomalous points outward.
APA, Harvard, Vancouver, ISO, and other styles
18

DWIVEDI, SAURABH. "DIMENSIONALITY REDUCTION FOR DATA DRIVEN PROCESS MODELING." University of Cincinnati / OhioLINK, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1069770129.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

XU, NUO. "AGGRESSIVE DIMENSIONALITY REDUCTION FOR DATA-DRIVEN MODELING." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1178640357.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Welshman, Christopher. "Dimensionality reduction for dynamical systems with parameters." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/dimensionality-reduction-for-dynamical-systems-with-parameters(69dab7de-b1dd-4d74-901f-61e02decf16a).html.

Full text
Abstract:
Dimensionality reduction methods allow for the study of high-dimensional systems by producing low-dimensional descriptions that preserve the relevant structure and features of interest. For dynamical systems, attractors are particularly important examples of such features, as they govern the long-term dynamics of the system, and are typically low-dimensional even if the state space is high- or infinite-dimensional. Methods for reduction need to be able to determine a suitable reduced state space in which to describe the attractor, and to produce a reduced description of the corresponding dynamics. In the presence of a parameter space, a system can possess a family of attractors. Parameters are important quantities that represent aspects of the physical system not directly modelled in the dynamics, and may take different values in different instances of the system. Therefore, including the parameter dependence in the reduced system is desirable, in order to capture the model's full range of behaviour. Existing methods typically involve algebraically manipulating the original differential equation, either by applying a projection, or by making local approximations around a fixed-point. In this work, we take more of a geometric approach, both for the reduction process and for determining the dynamics in the reduced space. For the reduction, we make use of an existing secant-based projection method, which has properties that make it well-suited to the reduction of attractors. We also regard the system to be a manifold and vector field, consider the attractor's normal and tangent spaces, and the derivatives of the vector field, in order to determine the desired properties of the reduced system. We introduce a secant culling procedure that allows for the number of secants to be greatly reduced in the case that the generating set explores a low-dimensional space. This reduces the computational cost of the secant-based method without sacrificing the detail captured in the data set. This makes it feasible to use secant-based methods with larger examples. We investigate a geometric formulation of the problem of dimensionality reduction of attractors, and identify and resolve the complications that arise. The benefit of this approach is that it is compatible with a wider range of examples than conventional approaches, particularly those with angular state variables. In turn this allows for application to non-autonomous systems with periodic time-dependence. We also adapt secant-based projection for use in this more general setting, which provides a concrete method of reduction. We then extend the geometric approach to include a parameter space, resulting in a family of vector fields and a corresponding family of attractors. Both the secant-based projection and the reproduction of dynamics are extended to produce a reduced model that correctly responds to the parameter dependence. The method is compatible with multiple parameters within a given region of parameter space. This is illustrated by a variety of examples.
APA, Harvard, Vancouver, ISO, and other styles
21

Chang, Kui-yu. "Nonlinear dimensionality reduction using probabilistic principal surfaces /." Digital version accessible at:, 2000. http://wwwlib.umi.com/cr/utexas/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Tosi, Alessandra. "Visualization and interpretability in probabilistic dimensionality reduction models." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/285013.

Full text
Abstract:
Over the last few decades, data analysis has swiftly evolved from being a task addressed mainly within the remit of multivariate statistics, to an endevour in which data heterogeneity, complexity and even sheer size, driven by computational advances, call for alternative strategies, such as those provided by pattern recognition and machine learning. Any data analysis process aims to extract new knowledge from data. Knowledge extraction is not a trivial task and it is not limited to the generation of data models or the recognition of patterns. The use of machine learning techniques for multivariate data analysis should in fact aim to achieve a dual target: interpretability and good performance. At best, both aspects of this target should not conflict with each other. This gap between data modelling and knowledge extraction must be acknowledged, in the sense that we can only extract knowledge from models through a process of interpretation. Exploratory information visualization is becoming a very promising tool for interpretation. When exploring multivariate data through visualization, high data dimensionality can be a big constraint, and the use of dimensionality reduction techniques is often compulsory. The need to find flexible methods for data modelling has led to the development of non-linear dimensionality reduction techniques, and many state-of-the-art approaches of this type fall in the domain of probabilistic modelling. These non-linear techniques can provide a flexible data representation and a more faithful model of the observed data compared to the linear ones, but often at the expense of model interpretability, which has an impact in the model visualization results. In manifold learning non-linear dimensionality reduction methods, when a high-dimensional space is mapped onto a lower-dimensional one, the obtained embedded manifold is subject to local geometrical distortion induced by the non-linear mapping. This kind of distortion can often lead to misinterpretations of the data set structure and of the obtained patterns. It is important to give relevance to the problem of how to quantify and visualize the distortion itself in order to interpret data in a more faithful way. The research reported in this thesis focuses on the development of methods and techniques for explicitly reintroducing the local distortion created by non-linear dimensionality reduction models into the low-dimensional visualization of the data that they produce, as well as in the definition of metrics for probabilistic geometries to address this problem. We do not only provide methods only for static data, but also for multivariate time series. The reintegration of the quantified non-linear distortion into the visualization space of the analysed non-linear dimensionality reduction methods is a goal by itself, but we go beyond it and consider alternative adequate metrics for probabilistic manifold learning. For that, we study the role of \textit{Random geometries}, that is, distributions of manifolds, in machine learning and data analysis in general. Methods for the estimation of distributions of data-supporting Riemannian manifolds as well as algorithms for computing interpolants over distributions of manifolds are defined. Experimental results show that inference made according to the random Riemannian metric leads to a more faithful generation of unobserved data.
Durant les últimes dècades, l’anàlisi de dades ha evolucionat ràpidament de ser una tasca dirigida principalment dins de l’àmbit de l’estadística multivariant, a un endevour en el qual l’heterogeneïtat de les dades, la complexitat i la simple grandària, impulsats pels avanços computacionals, exigeixen estratègies alternatives, tals com les previstes en el Reconeixement de Formes i l’Aprenentatge Automàtic. Qualsevol procés d’anàlisi de dades té com a objectiu extreure nou coneixement a partir de les dades. L’extracció de coneixement no és una tasca trivial i no es limita a la generació de models de dades o el reconeixement de patrons. L’ús de tècniques d’aprenentatge automàtic per a l’anàlisi de dades multivariades, de fet, hauria de tractar d’aconseguir un objectiu doble: la interpretabilitat i un bon rendiment. En el millor dels casos els dos aspectes d’aquest objectiu no han d’entrar en conflicte entre sí. S’ha de reconèixer la bretxa entre el modelatge de dades i l’extracció de coneixement, en el sentit que només podem extreure coneixement a partir dels models a través d’un procés d’interpretació. L’exploració de la visualització d’informació s’està convertint en una eina molt prometedora per a la interpretació dels models. Quan s’exploren les dades multivariades a través de la visualització, la gran dimensionalitat de les dades pot ser un obstacle, i moltes vegades és obligatori l’ús de tècniques de reducció de dimensionalitat. La necessitat de trobar mètodes flexibles per al modelatge de dades ha portat al desenvolupament de tècniques de reducció de dimensionalitat no lineals. L’estat de l’art d’aquests enfocaments cau moltes vegades en el domini de la modelització probabilística. Aquestes tècniques no lineals poden proporcionar una representació de les dades flexible i un model de les dades més fidel comparades amb els models lineals, però moltes vegades a costa de la interpretabilitat del model, que té un impacte en els resultats de visualització. En els mètodes d’aprenentatge de varietats amb reducció de dimensionalitat no lineals, quan un espai d’alta dimensió es projecta sobre un altre de dimensió menor, la varietat immersa obtinguda està subjecta a una distorsió geomètrica local induïda per la funció no lineal. Aquest tipus de distorsió pot conduir a interpretacions errònies de l’estructura del conjunt de dades i dels patrons obtinguts. Per això, és important donar rellevància al problema de com quantificar i visualitzar aquesta distorsió en sí, amb la finalitat d’interpretar les dades d’una manera més fidel. La recerca presentada en aquesta tesi se centra en el desenvolupament de mètodes i tècniques per reintroduir de forma explícita a l’espai de visualització la distorsió local creada per la funció no lineal. Aquesta recerca se centra també en la definició de mètriques per a geometries probabilístiques per fer front al problema de la distorsió de la funció en els models de reducció de dimensionalitat no lineals. No proporcionem mètodes només per a les dades estàtiques, sinó també per a sèries temporals multivariades. La reintegració de la distorsió no lineal a l’espai de visualització dels mètodes de reducció de dimensionalitat no lineals analitzats és un objectiu en sí mateix, però aquesta anàlisi va més enllà i considera també les mètriques probabilístiques adequades a l’aprenentatge de varietats probabilístiques. Per això, estudiem el paper de les Geometries Aleatòries (distribucions de les varietats) en Aprenentatge Automàtic i anàlisi de dades en general. Es defineixen aquí els mètodes per a l’estimació de les distribucions de varietats de Riemann de suport a les dades, així com els algorismes per calcular interpolants en les distribucions de varietats. Els resultats experimentals mostren que la inferència feta segons les mètriques de les varietats Riemannianes Aleatòries dóna origen a una generació de les dades observades més fidel
Durant les últimes dècades, l'anàlisi de dades ha evolucionat ràpidament de ser una tasca dirigida principalment dins de l'àmbit de l'estadística multivariant, a un endevour en el qual l'heterogeneïtat de les dades, la complexitat i la simple grandària, impulsats pels avanços computacionals, exigeixen estratègies alternatives, tals com les previstes en el Reconeixement de Formes i l'Aprenentatge Automàtic. La recerca presentada en aquesta tesi se centra en el desenvolupament de mètodes i tècniques per reintroduir de forma explícita a l'espai de visualització la distorsió local creada per la funció no lineal. Aquesta recerca se centra també en la definició de mètriques per a geometries probabilístiques per fer front al problema de la distorsió de la funció en els models de reducció de dimensionalitat no lineals. No proporcionem mètodes només per a les dades estàtiques, sinó també per a sèries temporals multivariades. La reintegració de la distorsió no lineal a l'espai de visualització dels mètodes de reducció de dimensionalitat no lineals analitzats és un objectiu en sí mateix, però aquesta anàlisi va més enllà i considera també les mètriques probabilístiques adequades a l'aprenentatge de varietats probabilístiques. Per això, estudiem el paper de les Geometries Aleatòries (distribucions de les varietats) en Aprenentatge Automàtic i anàlisi de dades en general. Es defineixen aquí els mètodes per a l'estimació de les distribucions de varietats de Riemann de suport a les dades, així com els algorismes per calcular interpolants en les distribucions de varietats. Els resultats experimentals mostren que la inferència feta segons les mètriques de les varietats Riemannianes Aleatòries dóna origen a una generació de les dades observades més fidel. Qualsevol procés d'anàlisi de dades té com a objectiu extreure nou coneixement a partir de les dades. L'extracció de coneixement no és una tasca trivial i no es limita a la generació de models de dades o el reconeixement de patrons. L'ús de tècniques d'aprenentatge automàtic per a l'anàlisi de dades multivariades, de fet, hauria de tractar d'aconseguir un objectiu doble: la interpretabilitat i un bon rendiment. En el millor dels casos els dos aspectes d'aquest objectiu no han d'entrar en conflicte entre sí. S'ha de reconèixer la bretxa entre el modelatge de dades i l'extracció de coneixement, en el sentit que només podem extreure coneixement a partir dels models a través d'un procés d'interpretació. L'exploració de la visualització d'informació s'està convertint en una eina molt prometedora per a la interpretació dels models. Quan s'exploren les dades multivariades a través de la visualització, la gran dimensionalitat de les dades pot ser un obstacle, i moltes vegades és obligatori l'ús de tècniques de reducció de dimensionalitat. La necessitat de trobar mètodes flexibles per al modelatge de dades ha portat al desenvolupament de tècniques de reducció de dimensionalitat no lineals. L'estat de l'art d'aquests enfocaments cau moltes vegades en el domini de la modelització probabilística. Aquestes tècniques no lineals poden proporcionar una representació de les dades flexible i un model de les dades més fidel comparades amb els models lineals, però moltes vegades a costa de la interpretabilitat del model, que té un impacte en els resultats de visualització. En els mètodes d'aprenentatge de varietats amb reducció de dimensionalitat no lineals, quan un espai d'alta dimensió es projecta sobre un altre de dimensió menor, la varietat immersa obtinguda està subjecta a una distorsió geomètrica local induïda per la funció no lineal. Aquest tipus de distorsió pot conduir a interpretacions errònies de l'estructura del conjunt de dades i dels patrons obtinguts. Per això, és important donar rellevància al problema de com quantificar i visualitzar aquesta distorsió en sì, amb la finalitat d'interpretar les dades d'una manera més fidel.
APA, Harvard, Vancouver, ISO, and other styles
23

Guo, Hong. "Feature generation and dimensionality reduction using genetic programming." Thesis, University of Liverpool, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.511054.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Kalamaras, Ilias. "A novel approach for multimodal graph dimensionality reduction." Thesis, Imperial College London, 2015. http://hdl.handle.net/10044/1/42224.

Full text
Abstract:
This thesis deals with the problem of multimodal dimensionality reduction (DR), which arises when the input objects, to be mapped on a low-dimensional space, consist of multiple vectorial representations, instead of a single one. Herein, the problem is addressed in two alternative manners. One is based on the traditional notion of modality fusion, but using a novel approach to determine the fusion weights. In order to optimally fuse the modalities, the known graph embedding DR framework is extended to multiple modalities by considering a weighted sum of the involved affinity matrices. The weights of the sum are automatically calculated by minimizing an introduced notion of inconsistency of the resulting multimodal affinity matrix. The other manner for dealing with the problem is an approach to consider all modalities simultaneously, without fusing them, which has the advantage of minimal information loss due to fusion. In order to avoid fusion, the problem is viewed as a multi-objective optimization problem. The multiple objective functions are defined based on graph representations of the data, so that their individual minimization leads to dimensionality reduction for each modality separately. The aim is to combine the multiple modalities without the need to assign importance weights to them, or at least postpone such an assignment as a last step. The proposed approaches were experimentally tested in mapping multimedia data on low-dimensional spaces for purposes of visualization, classification and clustering. The no-fusion approach, namely Multi-objective DR, was able to discover mappings revealing the structure of all modalities simultaneously, which cannot be discovered by weight-based fusion methods. However, it results in a set of optimal trade-offs, from which one needs to be selected, which is not trivial. The optimal-fusion approach, namely Multimodal Graph Embedding DR, is able to easily extend unimodal DR methods to multiple modalities, but depends on the limitations of the unimodal DR method used. Both the no-fusion and the optimal-fusion approaches were compared to state-of-the-art multimodal dimensionality reduction methods and the comparison showed performance improvement in visualization, classification and clustering tasks. The proposed approaches were also evaluated for different types of problems and data, in two diverse application fields, a visual-accessibility-enhanced search engine and a visualization tool for mobile network security data. The results verified their applicability in different domains and suggested promising directions for future advancements.
APA, Harvard, Vancouver, ISO, and other styles
25

Le, Moan Steven. "Dimensionality reduction and saliency for spectral image visualization." Phd thesis, Université de Bourgogne, 2012. http://tel.archives-ouvertes.fr/tel-00825495.

Full text
Abstract:
Nowadays, digital imaging is mostly based on the paradigm that a combinations of a small number of so-called primary colors is sufficient to represent any visible color. For instance, most cameras usepixels with three dimensions: Red, Green and Blue (RGB). Such low dimensional technology suffers from several limitations such as a sensitivity to metamerism and a bounded range of wavelengths. Spectral imaging technologies offer the possibility to overcome these downsides by dealing more finely withe the electromagnetic spectrum. Mutli-, hyper- or ultra-spectral images contain a large number of channels, depicting specific ranges of wavelength, thus allowing to better recover either the radiance of reflectance of the scene. Nevertheless,these large amounts of data require dedicated methods to be properly handled in a variety of applications. This work contributes to defining what is the useful information that must be retained for visualization on a low-dimensional display device. In this context, subjective notions such as appeal and naturalness are to be taken intoaccount, together with objective measures of informative content and dependency. Especially, a novel band selection strategy based on measures derived from Shannon's entropy is presented and the concept of spectral saliency is introduced.
APA, Harvard, Vancouver, ISO, and other styles
26

Bitzer, Sebastian. "Nonlinear dimensionality reduction for motion synthesis and control." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/4869.

Full text
Abstract:
Synthesising motion of human character animations or humanoid robots is vastly complicated by the large number of degrees of freedom in their kinematics. Control spaces become so large, that automated methods designed to adaptively generate movements become computationally infeasible or fail to find acceptable solutions. In this thesis we investigate how demonstrations of previously successful movements can be used to inform the production of new movements that are adapted to new situations. In particular, we evaluate the use of nonlinear dimensionality reduction techniques to find compact representations of demonstrations, and investigate how these can simplify the synthesis of new movements. Our focus lies on the Gaussian Process Latent Variable Model (GPLVM), because it has proven to capture the nonlinearities present in the kinematics of robots and humans. We present an in-depth analysis of the underlying theory which results in an alternative approach to initialise the GPLVM based on Multidimensional Scaling. We show that the new initialisation is better suited than PCA for nonlinear, synthetic data, but have to note that its advantage shrinks on motion data. Subsequently we show that the incorporation of additional structure constraints leads to low-dimensional representations which are sufficiently regular so that once learned dynamic movement primitives can be adapted to new situations without need for relearning. Finally, we demonstrate in a number of experiments where movements are generated for bimanual reaching, that, through the use of nonlinear dimensionality reduction, reinforcement learning can be scaled up to optimise humanoid movements.
APA, Harvard, Vancouver, ISO, and other styles
27

Ross, Ian. "Nonlinear dimensionality reduction methods in climate data analysis." Thesis, University of Bristol, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492479.

Full text
Abstract:
Linear dimensionality reduction techniques, notably principal component analysis, are widely used in climate data analysis as a means to aid in the interpretation of datasets of high dimensionality. These hnear methods may not be appropriate for the analysis of data arising from nonlinear processes occurring in the climate system. Numerous techniques for nonlinear dimensionality reduction have been developed recently that may provide a potentially useful tool for the identification of low-dimensional manifolds in climate data sets arising from nonlinear dynamics. In this thesis I apply three such techniques to the study of El Niño/Southern Oscillation variability in tropical Pacific sea surface temperatures and thermocline depth, comparing observational data with simulations from coupled atmosphere-ocean general circulation models from the CMIP3 multi-model ensemble.
APA, Harvard, Vancouver, ISO, and other styles
28

Bourrier, Anthony. "Compressed sensing and dimensionality reduction for unsupervised learning." Phd thesis, Université Rennes 1, 2014. http://tel.archives-ouvertes.fr/tel-01023030.

Full text
Abstract:
Cette thèse est motivée par la perspective de rapprochement entre traitement du signal et apprentissage statistique, et plus particulièrement par l'exploitation de techniques d'échantillonnage compressé afin de réduire le coût de tâches d'apprentissage. Après avoir rappelé les bases de l'échantillonnage compressé et mentionné quelques techniques d'analyse de données s'appuyant sur des idées similaires, nous proposons un cadre de travail pour l'estimation de paramètres de mélange de densités de probabilité dans lequel les données d'entraînement sont compressées en une représentation de taille fixe. Nous instancions ce cadre sur un modèle de mélange de Gaussiennes isotropes. Cette preuve de concept suggère l'existence de garanties théoriques de reconstruction d'un signal pour des modèles allant au-delà du modèle parcimonieux usuel de vecteurs. Nous étudions ainsi dans un second temps la généralisation de résultats de stabilité de problèmes inverses linéaires à des modèles tout à fait généraux de signaux. Nous proposons des conditions sous lesquelles des garanties de reconstruction peuvent être données dans un cadre général. Enfin, nous nous penchons sur un problème de recherche approchée de plus proche voisin avec calcul de signature des vecteurs afin de réduire la complexité. Dans le cadre où la distance d'intérêt dérive d'un noyau de Mercer, nous proposons de combiner un plongement explicite des données suivi d'un calcul de signatures, ce qui aboutit notamment à une recherche approchée plus précise.
APA, Harvard, Vancouver, ISO, and other styles
29

Shekhar, Karthik. "Dimensionality reduction in immunology : from viruses to cells." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/98339.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Chemical Engineering, February 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 301-318).
Developing successful prophylactic and therapeutic strategies against infections of RNA viruses like HIV requires a combined understanding of the evolutionary constraints of the virus, as well as of the immunologic determinants associated with effective viremic control. Recent technologies enable viral and immune parameters to be measured at an unprecedented scale and resolution across multiple patients, and the resulting data could be harnessed towards these goals. Such datasets typically involve a large number of parameters; the goal of analysis is to infer underlying biological relationships that connect these parameters by examining the data. This dissertation combines principles and techniques from the physical and the computational sciences to "reduce the dimensionality" of such data in order to reveal novel biological relationships of relevance to vaccination and therapeutic strategies. Much of our work is concerned with HIV. 1. How can collective evolutionary constraints be inferred from viral sequences derived from infected patients? Using principles of Random Matrix Theory, we derive a low dimensional representation of HIV proteins based on circulating sequence data and identify independent groups of residues within viral proteins that are coordinately linked. One such group of residues within the polyprotein Gag exhibits statistical signatures indicative of strong constraints that limit the viability of a higher proportion of strains bearing multiple mutations in this group. We validate these predictions from independent experimental data, and based on our results, propose candidate immunogens for the Caucasian American population that target these vulnerabilities. 2. To what extent do mutational patterns observed in circulating viral strains accurately reflect intrinsic fitness constraints of viral proteins? Each strain is the result of evolution against an immune background, which is highly diverse across patients. Spin models constructed to reproduce the prevalence of sequences have tested positively against intrinsic fitness assays (where immune selection is absent). Why "prevalence" should correlate with "replicative fitness" in the case of such complex evolutionary dynamics is conceptually puzzling. We combine computer simulations and analytical theory to show that the prevalence can correctly reflect the fitness rank order of mutant viral strains that are proximal in sequence space. Our analysis suggests that incorporating a "phylogenetic correction" in the parameters might improve the predictive power of these models. 3. Can cellular phenotypes be discovered in an unbiased way from high dimensional protein expression data in single cells? Mass cytometry, where > 40 protein parameters can be quantitated in single cells affords a route, but analyzing such high dimensional data can be challenging. Traditional "gating approaches" are unscalable, and computational methods that account for multivariate relationships among different proteins are needed. High-dimensional clustering and principal component analysis, two approaches that have been explored so far, suffer from important limitations. We propose a computational tool rooted in nonlinear dimensionality reduction which overcomes these limitations, and automatically identifies phenotypes based on a two-dimensional distillation of the cellular data; the latter feature facilitates unbiased visualization of high dimensional relationships. Our tool reveals a previously unappreciated phenotypic complexity within murine CD8+ T cells, and identifies a novel phenotype that is conflated by traditional approaches. 4. Antigen-specific immune cells that mediate efficacious antiviral responses in infections like HIV involve complex phenotypes and typically constitute a small fraction of the population. In such circumstances, seeking correlative features in bulk expression levels of key proteins can be misleading. Using the approach introduced in 3., we analyze multiparameter flow cytometry data of CD4+ T-cell samples from 20 patients representing diverse clinical groups, and identify cellular phenotypes whose proportion in patients is strongly correlated with quantitative clinical parameters. Many of these correlations are inconsistent with bulk signals. Furthermore, a number of correlative phenotypes are characterized by the expression of multiple proteins at individually modest levels; such subsets are likely be missed by conventional gating strategies. Using the in-patient proportions of different phenotypes as predictors, a cross-validated, sparse linear regression model explains 87 % of the variance in the viral load across the twenty patients. Our approach is scalable to datasets involving dozens of parameters.
by Karthik Shekhar.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
30

Payne, Terry R. "Dimensionality reduction and representation for nearest neighbour learning." Thesis, University of Aberdeen, 1999. https://eprints.soton.ac.uk/257788/.

Full text
Abstract:
An increasing number of intelligent information agents employ Nearest Neighbour learning algorithms to provide personalised assistance to the user. This assistance may be in the form of recognising or locating documents that the user might find relevant or interesting. To achieve this, documents must be mapped into a representation that can be presented to the learning algorithm. Simple heuristic techniques are generally used to identify relevant terms from the documents. These terms are then used to construct large, sparse training vectors. The work presented here investigates an alternative representation based on sets of terms, called set-valued attributes, and proposes a new family of Nearest Neighbour learning algorithms that utilise this set-based representation. The importance of discarding irrelevant terms from the documents is then addressed, and this is generalised to examine the behaviour of the Nearest Neighbour learning algorithm with high dimensional data sets containing such values. A variety of selection techniques used by other machine learning and information retrieval systems are presented, and empirically evaluated within the context of a Nearest Neighbour framework. The thesis concludes with a discussion of ways in which attribute selection and dimensionality reduction techniques may be used to improve the selection of relevant attributes, and thus increase the reliability and predictive accuracy of the Nearest Neighbour learning algorithm.
APA, Harvard, Vancouver, ISO, and other styles
31

Hui, Shirley. "FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/1173.

Full text
Abstract:
A topic of research that is frequently studied in Structural Biology is the problem of determining the degree of similarity between two protein structures. The most common solution is to perform a three dimensional structural alignment on the two structures. Rigid structural alignment algorithms have been developed in the past to accomplish this but treat the protein molecules as immutable structures. Since protein structures can bend and flex, rigid algorithms do not yield accurate results and as a result, flexible structural alignment algorithms have been developed. The problem with these algorithms is that the protein structures are represented using thousands of atomic coordinate variables. This results in a great computational burden due to the large number of degrees of freedom required to account for the flexibility. Past research in dimensionality reduction techniques has shown that a linear dimensionality reduction technique called Principal Component Analysis (PCA) is well suited for high dimensionality reduction. This thesis introduces a new flexible structural alignment algorithm called FlexSADRA, which uses PCA to perform flexible structural alignments. Test results show that FlexSADRA determines better alignments than rigid structural alignment algorithms. Unlike existing rigid and flexible algorithms, FlexSADRA addresses the problem in a significantly lower dimensionality problem space and assesses not only the structural fit but the structural feasibility of the final alignment.
APA, Harvard, Vancouver, ISO, and other styles
32

Donnelly, Mark Patrick. "Classification of body surface potential maps through dimensionality reduction." Thesis, University of Ulster, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.516131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Jayaraman, Gautam 1981. "Applying a randomized nearest neighbors algorithm to dimensionality reduction." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/29665.

Full text
Abstract:
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.
Includes bibliographical references (p. 95-96).
In this thesis, I implemented a randomized nearest neighbors algorithm in order to optimize an existing dimensionality reduction algorithm. In implementation I resolved details that were not considered in the design stage, and optimized the nearest neighbor system for use by the dimensionality reduction system. By using the new nearest neighbor system as a subroutine, the dimensionality reduction system runs in time O(n log n) with respect to the number of data points. This enables us to examine data sets that were prohibitively large before.
by Gautam Jayaraman.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
34

Ray, Sujan. "Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin161375080072697.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Ha, Sook Shin. "Dimensionality Reduction, Feature Selection and Visualization of Biological Data." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/77169.

Full text
Abstract:
Due to the high dimensionality of most biological data, it is a difficult task to directly analyze, model and visualize the data to gain biological insight. Thus, dimensionality reduction becomes an imperative pre-processing step in analyzing and visualizing high-dimensional biological data. Two major approaches to dimensionality reduction in genomic analysis and biomarker identification studies are: Feature extraction, creating new features by combining existing ones based on a mapping technique; and feature selection, choosing an optimal subset of all features based on an objective function. In this dissertation, we show how our innovative reduction schemes effectively reduce the dimensionality of DNA gene expression data to extract biologically interpretable and relevant features which result in enhancing the biomarker identification process. To construct biologically interpretable features and facilitate Muscular Dystrophy (MD) subtypes classification, we extract molecular features from MD microarray data by constructing sub-networks using a novel integrative scheme which utilizes protein-protein interaction (PPI) network, functional gene sets information and mRNA profiling data. The workflow includes three major steps: First, by combining PPI network structure and gene-gene co-expression relationship into a new distance metric, we apply affinity propagation clustering (APC) to build gene sub-networks; secondly, we further incorporate functional gene sets knowledge to complement the physical interaction information; finally, based on the constructed sub-network and gene set features, we apply multi-class support vector machine (MSVM) for MD sub-type classification and highlight the biomarkers contributing to the sub-type prediction. The experimental results show that our scheme could construct sub-networks that are more relevant to MD than those constructed by the conventional approach. Furthermore, our integrative strategy substantially improved the prediction accuracy, especially for those ‘hard-to-classify' sub-types. Conventionally, pathway-based analysis assumes that genes in a pathway equally contribute to a biological function, thus assigning uniform weight to genes. However, this assumption has been proven incorrect and applying uniform weight in the pathway analysis may not be an adequate approach for tasks like molecular classification of diseases, as genes in a functional group may have different differential power. Hence, we propose to use different weights for the pathway analysis which resulted in the development of four weighting schemes. We applied them in two existing pathway analysis methods using both real and simulated gene expression data for pathways. Weighting changes pathway scoring and brings up some new significant pathways, leading to the detection of disease-related genes that are missed under uniform weight. To help us understand our MD expression data better and derive scientific insight from it, we have explored a suite of visualization tools. Particularly, for selected top performing MD sub-networks, we displayed the network view using Cytoscape; functional annotations using IPA and DAVID functional analysis tools; expression pattern using heat-map and parallel coordinates plot; and MD associated pathways using KEGG pathway diagrams. We also performed weighted MD pathway analysis, and identified overlapping sub-networks across different weight schemes and different MD subtypes using Venn Diagrams, which resulted in the identification of a new sub-network significantly associated with MD. All those graphically displayed data and information helped us understand our MD data and the MD subtypes better, resulting in the identification of several potentially MD associated biomarker pathways and genes.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
36

Sharma, Vikas Manesh. "AN EVALUATION OF DIMENSIONALITY REDUCTION ON CELL FORMATION EFFICACY." Ohio University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1174503824.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Strong, Stephen. "Dimensionality Reduction for the Purposes of Automatic Pattern Classification." Thesis, Griffith University, 2013. http://hdl.handle.net/10072/367333.

Full text
Abstract:
Pattern classification is a common technique used in a variety of applications. From simple tasks, such as password acceptance, to more complex tasks, such as identication by biometrics, speech recognition, and text recognition. As a result, a large number of pattern classification algorithms have emerged, allowing computers to perform these tasks. However, these techniques become less eective when excessive data on a given object is provided in comparison to the number of samples required to train. As a result, much research has been placed in nding ecient methods of reducing the dimensionality of the data while maintaining maximum classification accuracy. Dimensionality reduction aims to maximize the spread between samples of dierent classes, and mimimumize the spread between samples of the same class. A variety of methods aiming to do this have been reported in the literature. The most common methods of dimensionality reduction are Linear Discriminant Analysis and its variants. These typically focus on the spread of all the data, without regard to how spread out sections of the data already are. Few methods disregard the spread of data that is already spread out, but these are not so commonly accepted. While the classication accuracy is often better using these techniques, the computational time is often a large obstacle. This thesis will investigate several methods of dimensionality reduction, and then discuss algorithms to improve upon the existing algorithms. These algorithms utilize techniques that can be implemented on any hardware, making them suitable for any form of hardware.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
38

Moraes, Lailson Bandeira de. "Two-dimensional extensions of semi-supervised dimensionality reduction methods." Universidade Federal de Pernambuco, 2013. https://repositorio.ufpe.br/handle/123456789/12388.

Full text
Abstract:
Submitted by João Arthur Martins (joao.arthur@ufpe.br) on 2015-03-11T18:17:21Z No. of bitstreams: 2 Dissertaçao Lailson de Moraes.pdf: 4634910 bytes, checksum: cbec580f8cbc24cb3feb2379a1d2dfbd (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-13T13:02:06Z (GMT) No. of bitstreams: 2 Dissertaçao Lailson de Moraes.pdf: 4634910 bytes, checksum: cbec580f8cbc24cb3feb2379a1d2dfbd (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Made available in DSpace on 2015-03-13T13:02:06Z (GMT). No. of bitstreams: 2 Dissertaçao Lailson de Moraes.pdf: 4634910 bytes, checksum: cbec580f8cbc24cb3feb2379a1d2dfbd (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2013-08-19
An important pre-processing step in machine learning systems is dimensionality reduction, which aims to produce compact representations of high-dimensional patterns. In computer vision applications, these patterns are typically images, that are represented by two-dimensional matrices. However, traditional dimensionality reduction techniques were designed to work only with vectors, what makes them a suboptimal choice for processing two-dimensional data. Another problem with traditional approaches for dimensionality reduction is that they operate either on a fully unsupervised or fully supervised way, what limits their efficiency in scenarios where supervised information is available only for a subset of the data. These situations are increasingly common because in many modern applications it is easy to produce raw data, but it is usually difficult to label it. In this study, we propose three dimensionality reduction methods that can overcome these limitations: Two-dimensional Semi-supervised Dimensionality Reduction (2D-SSDR), Two-dimensional Discriminant Principal Component Analysis (2D-DPCA), and Two-dimensional Semi-supervised Local Fisher Discriminant Analysis (2D-SELF). They work directly with two-dimensional data and can also take advantage of supervised information even if it is available only for a small part of the dataset. In addition, a fully supervised method, the Two-dimensional Local Fisher Discriminant Analysis (2D-LFDA), is proposed too. The methods are defined in terms of a two-dimensional framework, which was created in this study as well. The framework is capable of generally describing scatter-based methods for dimensionality reduction and can be used for deriving other two-dimensional methods in the future. Experimental results showed that, as expected, the novel methods are faster and more stable than the existing ones. Furthermore, 2D-SSDR, 2D-SELF, and 2D-LFDA achieved competitive classification accuracies most of the time when compared to the traditional methods. Therefore, these three techniques can be seen as viable alternatives to existing dimensionality reduction methods.
Um estágio importante de pré-processamento em sistemas de aprendizagem de máquina é a redução de dimensionalidade, que tem como objetivo produzir representações compactas de padrões de alta dimensionalidade. Em aplicações de visão computacional, estes padrões são tipicamente imagens, que são representadas por matrizes bi-dimensionais. Entretanto, técnicas tradicionais para redução de dimensionalidade foram projetadas para lidar apenas com vetores, o que as torna opções inadequadas para processar dados bi-dimensionais. Outro problema com as abordagens tradicionais para redução de dimensionalidade é que elas operam apenas de forma totalmente não-supervisionada ou totalmente supervisionada, o que limita sua eficiência em cenários onde dados supervisionados estão disponíveis apenas para um subconjunto das amostras. Estas situações são cada vez mais comuns por que em várias aplicações modernas é fácil produzir dados brutos, mas é geralmente difícil rotulá-los. Neste estudo, propomos três métodos para redução de dimensionalidade capazes de contornar estas limitações: Two-dimensional Semi-supervised Dimensionality Reduction (2DSSDR), Two-dimensional Discriminant Principal Component Analysis (2D-DPCA), e Twodimensional Semi-supervised Local Fisher Discriminant Analysis (2D-SELF). Eles operam diretamente com dados bi-dimensionais e também podem explorar informação supervisionada, mesmo que ela esteja disponível apenas para uma pequena parte das amostras. Adicionalmente, um método completamente supervisionado, o Two-dimensional Local Fisher Discriminant Analysis (2D-LFDA) é proposto também. Os métodos são definidos nos termos de um framework bi-dimensional, que foi igualmente criado neste estudo. O framework é capaz de descrever métodos para redução de dimensionalidade baseados em dispersão de forma geral e pode ser usado para derivar outras técnicas bi-dimensionais no futuro. Resultados experimentais mostraram que, como esperado, os novos métodos são mais rápidos e estáveis que as técnicas existentes. Além disto, 2D-SSDR, 2D-SELF, e 2D-LFDA obtiveram taxas de erro competitivas na maior parte das vezes quando comparadas aos métodos tradicionais. Desta forma, estas três técnicas podem ser vistas como alternativas viáveis aos métodos existentes para redução de dimensionalidade.
APA, Harvard, Vancouver, ISO, and other styles
39

Najim, S. A. "Faithful visualization and dimensionality reduction on graphics processing unit." Thesis, Bangor University, 2014. https://research.bangor.ac.uk/portal/en/theses/faithful-visualization-and-dimensionality-reduction-on-graphics-processing-unit(527800f6-191c-4257-98d1-7909a1ab9ead).html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Gashler, Michael S. "Advancing the Effectiveness of Non-Linear Dimensionality Reduction Techniques." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3216.

Full text
Abstract:
Data that is represented with high dimensionality presents a computational complexity challenge for many existing algorithms. Limiting dimensionality by discarding attributes is sometimes a poor solution to this problem because significant high-level concepts may be encoded in the data across many or all of the attributes. Non-linear dimensionality reduction (NLDR) techniques have been successful with many problems at minimizing dimensionality while preserving intrinsic high-level concepts that are encoded with varying combinations of attributes. Unfortunately, many challenges remain with existing NLDR techniques, including excessive computational requirements, an inability to benefit from prior knowledge, and an inability to handle certain difficult conditions that occur in data with many real-world problems. Further, certain practical factors have limited advancement in NLDR, such as a lack of clarity regarding suitable applications for NLDR, and a general inavailability of efficient implementations of complex algorithms. This dissertation presents a collection of papers that advance the state of NLDR in each of these areas. Contributions of this dissertation include: • An NLDR algorithm, called Manifold Sculpting, that optimizes its solution using graduated optimization. This approach enables it to obtain better results than methods that only optimize an approximate problem. Additionally, Manifold Sculpting can benefit from prior knowledge about the problem. • An intelligent neighbor-finding technique called SAFFRON that improves the breadth of problems that existing NLDR techniques can handle. • A neighborhood refinement technique called CycleCut that further increases the robustness of existing NLDR techniques, and that can work in conjunction with SAFFRON to solve difficult problems. • Demonstrations of specific applications for NLDR techniques, including the estimation of state within dynamical systems, training of recurrent neural networks, and imputing missing values in data. • An open source toolkit containing each of the techniques described in this dissertation, as well as several existing NLDR algorithms, and other useful machine learning methods.
APA, Harvard, Vancouver, ISO, and other styles
41

Li, Ye. "MULTIFACTOR DIMENSIONALITY REDUCTION WITH P RISK SCORES PER PERSON." UKnowledge, 2018. https://uknowledge.uky.edu/statistics_etds/34.

Full text
Abstract:
After reviewing Multifactor Dimensionality Reduction(MDR) and its extensions, an approach to obtain P(larger than 1) risk scores is proposed to predict the continuous outcome for each subject. We study the mean square error(MSE) of dimensionality reduced models fitted with sets of 2 risk scores and investigate the MSE for several special cases of the covariance matrix. A methodology is proposed to select a best set of P risk scores when P is specified a priori. Simulation studies based on true models of different dimensions(larger than 3) demonstrate that the selected set of P(larger than 1) risk scores outperforms the single aggregated risk score generated in AQMDR and illustrate that our methodology can determine a best set of P risk scores effectively. With different assumptions on the dimension of the true model, we considered the preferable set of risk scores from the best set of two risk scores and the best set of three risk scores. Further, we present a methodology to access a set of P risk scores when P is not given a priori. The expressions of asymptotic estimated mean square error of prediction(MSPE) are derived for a 1-dimensional model and 2-dimensional model. In the last main chapter, we apply the methodology of selecting a best set of risk scores where P has been specified a priori to Alzheimer’s Disease data and achieve a set of 2 risk scores and a set of three risk scores for each subject to predict measurements on biomarkers that are crucially involved in Alzheimer’s Disease.
APA, Harvard, Vancouver, ISO, and other styles
42

Di, Ciaccio Lucio. "Feature selection and dimensionality reduction for supervised data analysis." Thesis, Massachusetts Institute of Technology, 2016. https://hdl.handle.net/1721.1/122827.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2016
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 103-106).
by Lucio Di Ciaccio.
S.M.
S.M. Massachusetts Institute of Technology, Department of Aeronautics and Astronautics
APA, Harvard, Vancouver, ISO, and other styles
43

Atkison, Travis Levestis. "Using random projections for dimensionality reduction in identifying rogue applications." Diss., Mississippi State : Mississippi State University, 2009. http://library.msstate.edu/etd/show.asp?etd=etd-04032009-133701.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Gámez, López Antonio Juan. "Application of nonlinear dimensionality reduction to climate data for prediction." [S.l.] : [s.n.], 2006. http://opus.kobv.de/ubp/volltexte/2006/1095.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Colomé, Figueras Adrià. "Bimanual robot skills: MP encoding, dimensionality reduction and reinforcement learning." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/586163.

Full text
Abstract:
In our culture, robots have been in novels and cinema for a long time, but it has been specially in the last two decades when the improvements in hardware - better computational power and components - and advances in Artificial Intelligence (AI), have allowed robots to start sharing spaces with humans. Such situations require, aside from ethical considerations, robots to be able to move with both compliance and precision, and learn at different levels, such as perception, planning, and motion, being the latter the focus of this work. The first issue addressed in this thesis is inverse kinematics for redundant robot manipulators, i.e: positioning the robot joints so as to reach a certain end-effector pose. We opt for iterative solutions based on the inversion of the kinematic Jacobian of a robot, and propose to filter and limit the gains in the spectral domain, while also unifying such approach with a continuous, multipriority scheme. Such inverse kinematics method is then used to derive manipulability in the whole workspace of an antropomorphic arm, and the coordination of two arms is subsequently optimized by finding their best relative positioning. Having solved the kinematic issues, a robot learning within a human environment needs to move compliantly, with limited amount of force, in order not to harm any humans or cause any damage, while being as precise as possible. Therefore, we developed two dynamic models for the same redundant arm we had analysed kinematically: The first based on local models with Gaussian projections, and the second characterizing the most problematic term of the dynamics, namely friction. Such models allowed us to implement feed-forward controllers, where we can actively change the weights in the compliance-precision tradeoff. Moreover, we used such models to predict external forces acting on the robot, without the use of force sensors. Afterwards, we noticed that bimanual robots must coordinate their components (or limbs) and be able to adapt to new situations with ease. Over the last decade, a number of successful applications for learning robot motion tasks have been published. However, due to the complexity of a complete system including all the required elements, most of these applications involve only simple robots with a large number of high-end technology sensors, or consist of very simple and controlled tasks. Using our previous framework for kinematics and control, we relied on two types of movement primitives to encapsulate robot motion. Such movement primitives are very suitable for using reinforcement learning. In particular, we used direct policy search, which uses the motion parametrization as the policy itself. In order to improve the learning speed in real robot applications, we generalized a policy search algorithm to give some importance to samples yielding a bad result, and we paid special attention to the dimensionality of the motion parametrization. We reduced such dimensionality with linear methods, using the rewards obtained through motion repetition and execution. We tested such framework in a bimanual task performed by two antropomorphic arms, such as the folding of garments, showing how a reduced dimensionality can provide qualitative information about robot couplings and help to speed up the learning of tasks when robot motion executions are costly.
A la nostra cultura, els robots han estat presents en novel·les i cinema des de fa dècades, però ha sigut especialment en les últimes dues quan les millores en hardware (millors capacitats de còmput) i els avenços en intel·ligència artificial han permès que els robots comencin a compartir espais amb els humans. Aquestes situacions requereixen, a banda de consideracions ètiques, que els robots siguin capaços de moure's tant amb suavitat com amb precisió, i d'aprendre a diferents nivells, com són la percepció, planificació i moviment, essent l'última el centre d'atenció d'aquest treball. El primer problema adreçat en aquesta tesi és la cinemàtica inversa, i.e.: posicionar les articulacions del robot de manera que l'efector final estigui en una certa posició i orientació. Hem estudiat el camp de les solucions iteratives, basades en la inversió del Jacobià cinemàtic d'un robot, i proposem un filtre que limita els guanys en el seu domini espectral, mentre també unifiquem tal mètode dins un esquema multi-prioritat i continu. Aquest mètode per a la cinemàtica inversa és usat a l'hora d'encapsular tota la informació sobre l'espai de treball d'un braç antropomòrfic, i les capacitats de coordinació entre dos braços són optimitzades, tot trobant la seva millor posició relativa en l'espai. Havent resolt les dificultats cinemàtiques, un robot que aprèn en un entorn humà necessita moure's amb suavitat exercint unes forces limitades per tal de no causar danys, mentre es mou amb la màxima precisió possible. Per tant, hem desenvolupat dos models dinàmics per al mateix braç robòtic redundant que havíem analitzat des del punt de vista cinemàtic: El primer basat en models locals amb projeccions de Gaussianes i el segon, caracteritzant el terme més problemàtic i difícil de representar de la dinàmica, la fricció. Aquests models ens van permetre utilitzar controladors coneguts com "feed-forward", on podem canviar activament els guanys buscant l'equilibri precisió-suavitat que més convingui. A més, hem usat aquests models per a inferir les forces externes actuant en el robot, sense la necessitat de sensors de força. Més endavant, ens hem adonat que els robots bimanuals han de coordinar els seus components (braços) i ser capaços d'adaptar-se a noves situacions amb facilitat. Al llarg de l'última dècada, diverses aplicacions per aprendre tasques motores robòtiques amb èxit han estat publicades. No obstant, degut a la complexitat d'un sistema complet que inclogui tots els elements necessaris, la majoria d'aquestes aplicacions consisteixen en robots més aviat simples amb costosos sensors d'última generació, o a resoldre tasques senzilles en un entorn molt controlat. Utilitzant el nostre treball en cinemàtica i control, ens hem basat en dos tipus de primitives de moviment per caracteritzar la motricitat robòtica. Aquestes primitives de moviment són molt adequades per usar aprenentatge per reforç. En particular, hem usat la búsqueda directa de la política, un camp de l'aprenentatge per reforç que usa la parametrització del moviment com la pròpia política. Per tal de millorar la velocitat d'aprenentatge en aplicacions amb robots reals, hem generalitzat un algoritme de búsqueda directa de política per a donar importància a les mostres amb mal resultat, i hem donat especial atenció a la reducció de dimensionalitat en la parametrització dels moviments. Hem reduït la dimensionalitat amb mètodes lineals, utilitzant les recompenses obtingudes EN executar els moviments. Aquests mètodes han estat provats en tasques bimanuals com són plegar roba, usant dos braços antropomòrfics. Els resultats mostren com la reducció de dimensionalitat pot aportar informació qualitativa d'una tasca, i al mateix temps ajuda a aprendre-la més ràpid quan les execucions amb robots reals són costoses.
APA, Harvard, Vancouver, ISO, and other styles
46

Kharal, Rosina. "Semidefinite Embedding for the Dimensionality Reduction of DNA Microarray Data." Thesis, University of Waterloo, 2006. http://hdl.handle.net/10012/2945.

Full text
Abstract:
Harnessing the power of DNA microarray technology requires the existence of analysis methods that accurately interpret microarray data. Current literature abounds with algorithms meant for the investigation of microarray data. However, there is need for an efficient approach that combines different techniques of microarray data analysis and provides a viable solution to dimensionality reduction of microarray data. Reducing the high dimensionality of microarray data is one approach in striving to better understand the information contained within the data. We propose a novel approach for dimensionality reduction of microarray data that effectively combines different techniques in the study of DNA microarrays. Our method, KAS (kernel alignment with semidefinite embedding), aids the visualization of microarray data in two dimensions and shows improvement over existing dimensionality reduction methods such as PCA, LLE and Isomap.
APA, Harvard, Vancouver, ISO, and other styles
47

Hira, Zena Maria. "Dimensionality reduction methods for microarray cancer data using prior knowledge." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/33812.

Full text
Abstract:
Microarray studies are currently a very popular source of biological information. They allow the simultaneous measurement of hundreds of thousands of genes, drastically increasing the amount of data that can be gathered in a small amount of time and also decreasing the cost of producing such results. Large numbers of high dimensional data sets are currently being generated and there is an ongoing need to find ways to analyse them to obtain meaningful interpretations. Many microarray experiments are concerned with answering specific biological or medical questions regarding diseases and treatments. Cancer is one of the most popular research areas and there is a plethora of data available requiring in depth analysis. Although the analysis of microarray data has been thoroughly researched over the past ten years, new approaches still appear regularly, and may lead to a better understanding of the available information. The size of the modern data sets presents considerable difficulties to traditional methodologies based on hypothesis testing, and there is a new move towards the use of machine learning in microarray data analysis. Two new methods of using prior genetic knowledge in machine learning algorithms have been developed and their results are compared with existing methods. The prior knowledge consists of biological pathway data that can be found in on-line databases, and gene ontology terms. The first method, called ''a priori manifold learning'' uses the prior knowledge when constructing a manifold for non-linear feature extraction. It was found to perform better than both linear principal components analysis (PCA) and the non-linear Isomap algorithm (without prior knowledge) in both classification accuracy and quality of the clusters. Both pathway and GO terms were used as prior knowledge, and results showed that using GO terms can make the models over-fit the data. In the cases where the use of GO terms does not over-fit, the results are better than PCA, Isomap and a priori manifold learning using pathways. The second method, called ''the feature selection over pathway segmentation algorithm'', uses the pathway information to split a big dataset into smaller ones. Then, using AdaBoost, decision trees are constructed for each of the smaller sets and the sets that achieve higher classification accuracy are identified. The individual genes in these subsets are assessed to determine their role in the classification process. Using data sets concerning chronic myeloid leukaemia (CML) two subsets based on pathways were found to be strongly associated with the response to treatment. Using a different data set from measurements on lower grade glioma (LGG) tumours, four informative gene sets were discovered. Further analysis based on the Gini importance measure identified a set of genes for each cancer type (CML, LGG) that could predict the response to treatment very accurately (> 90%). Moreover a single gene that can predict the response to CML treatment accurately was identified.
APA, Harvard, Vancouver, ISO, and other styles
48

Gorrell, Genevieve. "Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing." Doctoral thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2006. http://www.bibl.liu.se/liupubl/disp/disp2006/tek1045s.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Boone, Gary Noel. "Extreme dimensionality reduction for text learning : cluster-generated feature spaces." Diss., Georgia Institute of Technology, 2000. http://hdl.handle.net/1853/8139.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Gámez, López Antonio Juan. "Application of nonlinear dimensionality reduction to climate data for prediction." Phd thesis, Universität Potsdam, 2006. http://opus.kobv.de/ubp/volltexte/2006/1095/.

Full text
Abstract:
This Thesis was devoted to the study of the coupled system composed by El Niño/Southern Oscillation and the Annual Cycle. More precisely, the work was focused on two main problems: 1. How to separate both oscillations into an affordable model for understanding the behaviour of the whole system. 2. How to model the system in order to achieve a better understanding of the interaction, as well as to predict future states of the system. We focused our efforts in the Sea Surface Temperature equations, considering that atmospheric effects were secondary to the ocean dynamics. The results found may be summarised as follows: 1. Linear methods are not suitable for characterising the dimensionality of the sea surface temperature in the tropical Pacific Ocean. Therefore they do not help to separate the oscillations by themselves. Instead, nonlinear methods of dimensionality reduction are proven to be better in defining a lower limit for the dimensionality of the system as well as in explaining the statistical results in a more physical way [1]. In particular, Isomap, a nonlinear modification of Multidimensional Scaling methods, provides a physically appealing method of decomposing the data, as it substitutes the euclidean distances in the manifold by an approximation of the geodesic distances. We expect that this method could be successfully applied to other oscillatory extended systems and, in particular, to meteorological systems. 2. A three dimensional dynamical system could be modeled, using a backfitting algorithm, for describing the dynamics of the sea surface temperature in the tropical Pacific Ocean. We observed that, although there were few data points available, we could predict future behaviours of the coupled ENSO-Annual Cycle system with an accuracy of less than six months, although the constructed system presented several drawbacks: few data points to input in the backfitting algorithm, untrained model, lack of forcing with external data and simplification using a close system. Anyway, ensemble prediction techniques showed that the prediction skills of the three dimensional time series were as good as those found in much more complex models. This suggests that the climatological system in the tropics is mainly explained by ocean dynamics, while the atmosphere plays a secondary role in the physics of the process. Relevant predictions for short lead times can be made using a low dimensional system, despite its simplicity. The analysis of the SST data suggests that nonlinear interaction between the oscillations is small, and that noise plays a secondary role in the fundamental dynamics of the oscillations [2]. A global view of the work shows a general procedure to face modeling of climatological systems. First, we should find a suitable method of either linear or nonlinear dimensionality reduction. Then, low dimensional time series could be extracted out of the method applied. Finally, a low dimensional model could be found using a backfitting algorithm in order to predict future states of the system.
Das Ziel dieser Arbeit ist es das Verhalten der Temperatur des Meers im tropischen Pazifischen Ozean vorherzusagen. In diesem Gebiet der Welt finden zwei wichtige Phänomene gleichzeitig statt: der jährliche Zyklus und El Niño. Der jährliche Zyklus kann als Oszillation physikalischer Variablen (z.B. Temperatur, Windgeschwindigkeit, Höhe des Meeresspiegels), welche eine Periode von einem Jahr zeigen, definiert werden. Das bedeutet, dass das Verhalten des Meers und der Atmosphäre alle zwölf Monate ähnlich sind (alle Sommer sind ähnlicher jedes Jahr als Sommer und Winter des selben Jahres). El Niño ist eine irreguläre Oszillation weil sie abwechselnd hohe und tiefe Werte erreicht, aber nicht zu einer festen Zeit, wie der jährliche Zyklus. Stattdessen, kann el Niño in einem Jahr hohe Werte erreichen und dann vier, fünf oder gar sieben Jahre benötigen, um wieder aufzutreten. Es ist dabei zu beachten, dass zwei Phänomene, die im selben Raum stattfinden, sich gegenseitig beeinflussen. Dennoch weiß man sehr wenig darüber, wie genau el Niño den jährlichen Zyklus beeinflusst, und umgekehrt. Das Ziel dieser Arbeit ist es, erstens, sich auf die Temperatur des Meers zu fokussieren, um das gesamte System zu analysieren; zweitens, alle Temperaturzeitreihen im tropischen Pazifischen Ozean auf die geringst mögliche Anzahl zu reduzieren, um das System einerseits zu vereinfachen, ohne aber andererseits wesentliche Information zu verlieren. Dieses Vorgehen ähnelt der Analyse einer langen schwingenden Feder, die sich leicht um die Ruhelage bewegt. Obwohl die Feder lang ist, können wir näherungsweise die ganze Feder zeichnen wenn wir die höchsten Punkte zur einen bestimmten Zeitpunkt kennen. Daher, brauchen wir nur einige Punkte der Feder um ihren Zustand zu charakterisieren. Das Hauptproblem in unserem Fall ist die Mindestanzahl von Punkten zu finden, die ausreicht, um beide Phänomene zu beschreiben. Man hat gefunden, dass diese Anzahl drei ist. Nach diesem Teil, war das Ziel vorherzusagen, wie die Temperaturen sich in der Zeit entwickeln werden, wenn man die aktuellen und vergangenen Temperaturen kennt. Man hat beobachtet, dass eine genaue Vorhersage bis zu sechs oder weniger Monate gemacht werden kann, und dass die Temperatur für ein Jahr nicht vorhersagbar ist. Ein wichtiges Resultat ist, dass die Vorhersagen auf kurzen Zeitskalen genauso gut sind, wie die Vorhersagen, welche andere Autoren mit deutlich komplizierteren Methoden erhalten haben. Deswegen ist meine Aussage, dass das gesamte System von jährlichem Zyklus und El Niño mittels einfacherer Methoden als der heute angewandten vorhergesagt werden kann.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography