Dissertations / Theses on the topic 'Dimensionality reduction analysis'

To see the other types of publications on this topic, follow the link: Dimensionality reduction analysis.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Dimensionality reduction analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Khosla, Nitin, and n/a. "Dimensionality Reduction Using Factor Analysis." Griffith University. School of Engineering, 2006. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20061010.151217.

Full text
Abstract:
In many pattern recognition applications, a large number of features are extracted in order to ensure an accurate classification of unknown classes. One way to solve the problems of high dimensions is to first reduce the dimensionality of the data to a manageable size, keeping as much of the original information as possible and then feed the reduced-dimensional data into a pattern recognition system. In this situation, dimensionality reduction process becomes the pre-processing stage of the pattern recognition system. In addition to this, probablility density estimation, with fewer variables is a simpler approach for dimensionality reduction. Dimensionality reduction is useful in speech recognition, data compression, visualization and exploratory data analysis. Some of the techniques which can be used for dimensionality reduction are; Factor Analysis (FA), Principal Component Analysis(PCA), and Linear Discriminant Analysis(LDA). Factor Analysis can be considered as an extension of Principal Component Analysis. The EM (expectation maximization) algorithm is ideally suited to problems of this sort, in that it produces maximum-likelihood (ML) estimates of parameters when there is a many-to-one mapping from an underlying distribution to the distribution governing the observation, conditioned upon the obervations. The maximization step then provides a new estimate of the parameters. This research work compares the techniques; Factor Analysis (Expectation-Maximization algorithm based), Principal Component Analysis and Linear Discriminant Analysis for dimensionality reduction and investigates Local Factor Analysis (EM algorithm based) and Local Principal Component Analysis using Vector Quantization.
APA, Harvard, Vancouver, ISO, and other styles
2

Vamulapalli, Harika Rao. "On Dimensionality Reduction of Data." ScholarWorks@UNO, 2010. http://scholarworks.uno.edu/td/1211.

Full text
Abstract:
Random projection method is one of the important tools for the dimensionality reduction of data which can be made efficient with strong error guarantees. In this thesis, we focus on linear transforms of high dimensional data to the low dimensional space satisfying the Johnson-Lindenstrauss lemma. In addition, we also prove some theoretical results relating to the projections that are of interest when applying them in practical applications. We show how the technique can be applied to synthetic data with probabilistic guarantee on the pairwise distance. The connection between dimensionality reduction and compressed sensing is also discussed.
APA, Harvard, Vancouver, ISO, and other styles
3

Vasiloglou, Nikolaos. "Isometry and convexity in dimensionality reduction." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/28120.

Full text
Abstract:
Thesis (M. S.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009.
Committee Chair: David Anderson; Committee Co-Chair: Alexander Gray; Committee Member: Anthony Yezzi; Committee Member: Hongyuan Zha; Committee Member: Justin Romberg; Committee Member: Ronald Schafer.
APA, Harvard, Vancouver, ISO, and other styles
4

Ross, Ian. "Nonlinear dimensionality reduction methods in climate data analysis." Thesis, University of Bristol, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492479.

Full text
Abstract:
Linear dimensionality reduction techniques, notably principal component analysis, are widely used in climate data analysis as a means to aid in the interpretation of datasets of high dimensionality. These hnear methods may not be appropriate for the analysis of data arising from nonlinear processes occurring in the climate system. Numerous techniques for nonlinear dimensionality reduction have been developed recently that may provide a potentially useful tool for the identification of low-dimensional manifolds in climate data sets arising from nonlinear dynamics. In this thesis I apply three such techniques to the study of El Niño/Southern Oscillation variability in tropical Pacific sea surface temperatures and thermocline depth, comparing observational data with simulations from coupled atmosphere-ocean general circulation models from the CMIP3 multi-model ensemble.
APA, Harvard, Vancouver, ISO, and other styles
5

Ray, Sujan. "Dimensionality Reduction in Healthcare Data Analysis on Cloud Platform." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin161375080072697.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Di, Ciaccio Lucio. "Feature selection and dimensionality reduction for supervised data analysis." Thesis, Massachusetts Institute of Technology, 2016. https://hdl.handle.net/1721.1/122827.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2016
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 103-106).
by Lucio Di Ciaccio.
S.M.
S.M. Massachusetts Institute of Technology, Department of Aeronautics and Astronautics
APA, Harvard, Vancouver, ISO, and other styles
7

Coleman, Ashley B. "Feature Extraction using Dimensionality Reduction Techniques: Capturing the Human Perspective." Wright State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=wright1452775165.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Hui, Shirley. "FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/1173.

Full text
Abstract:
A topic of research that is frequently studied in Structural Biology is the problem of determining the degree of similarity between two protein structures. The most common solution is to perform a three dimensional structural alignment on the two structures. Rigid structural alignment algorithms have been developed in the past to accomplish this but treat the protein molecules as immutable structures. Since protein structures can bend and flex, rigid algorithms do not yield accurate results and as a result, flexible structural alignment algorithms have been developed. The problem with these algorithms is that the protein structures are represented using thousands of atomic coordinate variables. This results in a great computational burden due to the large number of degrees of freedom required to account for the flexibility. Past research in dimensionality reduction techniques has shown that a linear dimensionality reduction technique called Principal Component Analysis (PCA) is well suited for high dimensionality reduction. This thesis introduces a new flexible structural alignment algorithm called FlexSADRA, which uses PCA to perform flexible structural alignments. Test results show that FlexSADRA determines better alignments than rigid structural alignment algorithms. Unlike existing rigid and flexible algorithms, FlexSADRA addresses the problem in a significantly lower dimensionality problem space and assesses not only the structural fit but the structural feasibility of the final alignment.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhang, Yuyao. "Non-linear dimensionality reduction and sparse representation models for facial analysis." Thesis, Lyon, INSA, 2014. http://www.theses.fr/2014ISAL0019/document.

Full text
Abstract:
Les techniques d'analyse du visage nécessitent généralement une représentation pertinente des images, notamment en passant par des techniques de réduction de la dimension, intégrées dans des schémas plus globaux, et qui visent à capturer les caractéristiques discriminantes des signaux. Dans cette thèse, nous fournissons d'abord une vue générale sur l'état de l'art de ces modèles, puis nous appliquons une nouvelle méthode intégrant une approche non-linéaire, Kernel Similarity Principle Component Analysis (KS-PCA), aux Modèles Actifs d'Apparence (AAMs), pour modéliser l'apparence d'un visage dans des conditions d'illumination variables. L'algorithme proposé améliore notablement les résultats obtenus par l'utilisation d'une transformation PCA linéaire traditionnelle, que ce soit pour la capture des caractéristiques saillantes, produites par les variations d'illumination, ou pour la reconstruction des visages. Nous considérons aussi le problème de la classification automatiquement des poses des visages pour différentes vues et différentes illumination, avec occlusion et bruit. Basé sur les méthodes des représentations parcimonieuses, nous proposons deux cadres d'apprentissage de dictionnaire pour ce problème. Une première méthode vise la classification de poses à l'aide d'une représentation parcimonieuse active (Active Sparse Representation ASRC). En fait, un dictionnaire est construit grâce à un modèle linéaire, l'Incremental Principle Component Analysis (Incremental PCA), qui a tendance à diminuer la redondance intra-classe qui peut affecter la performance de la classification, tout en gardant la redondance inter-classes, qui elle, est critique pour les représentations parcimonieuses. La seconde approche proposée est un modèle des représentations parcimonieuses basé sur le Dictionary-Learning Sparse Representation (DLSR), qui cherche à intégrer la prise en compte du critère de la classification dans le processus d'apprentissage du dictionnaire. Nous faisons appel dans cette partie à l'algorithme K-SVD. Nos résultats expérimentaux montrent la performance de ces deux méthodes d'apprentissage de dictionnaire. Enfin, nous proposons un nouveau schéma pour l'apprentissage de dictionnaire adapté à la normalisation de l'illumination (Dictionary Learning for Illumination Normalization: DLIN). L'approche ici consiste à construire une paire de dictionnaires avec une représentation parcimonieuse. Ces dictionnaires sont construits respectivement à partir de visages illuminées normalement et irrégulièrement, puis optimisés de manière conjointe. Nous utilisons un modèle de mixture de Gaussiennes (GMM) pour augmenter la capacité à modéliser des données avec des distributions plus complexes. Les résultats expérimentaux démontrent l'efficacité de notre approche pour la normalisation d'illumination
Face analysis techniques commonly require a proper representation of images by means of dimensionality reduction leading to embedded manifolds, which aims at capturing relevant characteristics of the signals. In this thesis, we first provide a comprehensive survey on the state of the art of embedded manifold models. Then, we introduce a novel non-linear embedding method, the Kernel Similarity Principal Component Analysis (KS-PCA), into Active Appearance Models, in order to model face appearances under variable illumination. The proposed algorithm successfully outperforms the traditional linear PCA transform to capture the salient features generated by different illuminations, and reconstruct the illuminated faces with high accuracy. We also consider the problem of automatically classifying human face poses from face views with varying illumination, as well as occlusion and noise. Based on the sparse representation methods, we propose two dictionary-learning frameworks for this pose classification problem. The first framework is the Adaptive Sparse Representation pose Classification (ASRC). It trains the dictionary via a linear model called Incremental Principal Component Analysis (Incremental PCA), tending to decrease the intra-class redundancy which may affect the classification performance, while keeping the extra-class redundancy which is critical for sparse representation. The other proposed work is the Dictionary-Learning Sparse Representation model (DLSR) that learns the dictionary with the aim of coinciding with the classification criterion. This training goal is achieved by the K-SVD algorithm. In a series of experiments, we show the performance of the two dictionary-learning methods which are respectively based on a linear transform and a sparse representation model. Besides, we propose a novel Dictionary Learning framework for Illumination Normalization (DL-IN). DL-IN based on sparse representation in terms of coupled dictionaries. The dictionary pairs are jointly optimized from normally illuminated and irregularly illuminated face image pairs. We further utilize a Gaussian Mixture Model (GMM) to enhance the framework's capability of modeling data under complex distribution. The GMM adapt each model to a part of the samples and then fuse them together. Experimental results demonstrate the effectiveness of the sparsity as a prior for patch-based illumination normalization for face images
APA, Harvard, Vancouver, ISO, and other styles
10

Moraes, Lailson Bandeira de. "Two-dimensional extensions of semi-supervised dimensionality reduction methods." Universidade Federal de Pernambuco, 2013. https://repositorio.ufpe.br/handle/123456789/12388.

Full text
Abstract:
Submitted by João Arthur Martins (joao.arthur@ufpe.br) on 2015-03-11T18:17:21Z No. of bitstreams: 2 Dissertaçao Lailson de Moraes.pdf: 4634910 bytes, checksum: cbec580f8cbc24cb3feb2379a1d2dfbd (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-13T13:02:06Z (GMT) No. of bitstreams: 2 Dissertaçao Lailson de Moraes.pdf: 4634910 bytes, checksum: cbec580f8cbc24cb3feb2379a1d2dfbd (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Made available in DSpace on 2015-03-13T13:02:06Z (GMT). No. of bitstreams: 2 Dissertaçao Lailson de Moraes.pdf: 4634910 bytes, checksum: cbec580f8cbc24cb3feb2379a1d2dfbd (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2013-08-19
An important pre-processing step in machine learning systems is dimensionality reduction, which aims to produce compact representations of high-dimensional patterns. In computer vision applications, these patterns are typically images, that are represented by two-dimensional matrices. However, traditional dimensionality reduction techniques were designed to work only with vectors, what makes them a suboptimal choice for processing two-dimensional data. Another problem with traditional approaches for dimensionality reduction is that they operate either on a fully unsupervised or fully supervised way, what limits their efficiency in scenarios where supervised information is available only for a subset of the data. These situations are increasingly common because in many modern applications it is easy to produce raw data, but it is usually difficult to label it. In this study, we propose three dimensionality reduction methods that can overcome these limitations: Two-dimensional Semi-supervised Dimensionality Reduction (2D-SSDR), Two-dimensional Discriminant Principal Component Analysis (2D-DPCA), and Two-dimensional Semi-supervised Local Fisher Discriminant Analysis (2D-SELF). They work directly with two-dimensional data and can also take advantage of supervised information even if it is available only for a small part of the dataset. In addition, a fully supervised method, the Two-dimensional Local Fisher Discriminant Analysis (2D-LFDA), is proposed too. The methods are defined in terms of a two-dimensional framework, which was created in this study as well. The framework is capable of generally describing scatter-based methods for dimensionality reduction and can be used for deriving other two-dimensional methods in the future. Experimental results showed that, as expected, the novel methods are faster and more stable than the existing ones. Furthermore, 2D-SSDR, 2D-SELF, and 2D-LFDA achieved competitive classification accuracies most of the time when compared to the traditional methods. Therefore, these three techniques can be seen as viable alternatives to existing dimensionality reduction methods.
Um estágio importante de pré-processamento em sistemas de aprendizagem de máquina é a redução de dimensionalidade, que tem como objetivo produzir representações compactas de padrões de alta dimensionalidade. Em aplicações de visão computacional, estes padrões são tipicamente imagens, que são representadas por matrizes bi-dimensionais. Entretanto, técnicas tradicionais para redução de dimensionalidade foram projetadas para lidar apenas com vetores, o que as torna opções inadequadas para processar dados bi-dimensionais. Outro problema com as abordagens tradicionais para redução de dimensionalidade é que elas operam apenas de forma totalmente não-supervisionada ou totalmente supervisionada, o que limita sua eficiência em cenários onde dados supervisionados estão disponíveis apenas para um subconjunto das amostras. Estas situações são cada vez mais comuns por que em várias aplicações modernas é fácil produzir dados brutos, mas é geralmente difícil rotulá-los. Neste estudo, propomos três métodos para redução de dimensionalidade capazes de contornar estas limitações: Two-dimensional Semi-supervised Dimensionality Reduction (2DSSDR), Two-dimensional Discriminant Principal Component Analysis (2D-DPCA), e Twodimensional Semi-supervised Local Fisher Discriminant Analysis (2D-SELF). Eles operam diretamente com dados bi-dimensionais e também podem explorar informação supervisionada, mesmo que ela esteja disponível apenas para uma pequena parte das amostras. Adicionalmente, um método completamente supervisionado, o Two-dimensional Local Fisher Discriminant Analysis (2D-LFDA) é proposto também. Os métodos são definidos nos termos de um framework bi-dimensional, que foi igualmente criado neste estudo. O framework é capaz de descrever métodos para redução de dimensionalidade baseados em dispersão de forma geral e pode ser usado para derivar outras técnicas bi-dimensionais no futuro. Resultados experimentais mostraram que, como esperado, os novos métodos são mais rápidos e estáveis que as técnicas existentes. Além disto, 2D-SSDR, 2D-SELF, e 2D-LFDA obtiveram taxas de erro competitivas na maior parte das vezes quando comparadas aos métodos tradicionais. Desta forma, estas três técnicas podem ser vistas como alternativas viáveis aos métodos existentes para redução de dimensionalidade.
APA, Harvard, Vancouver, ISO, and other styles
11

Bartholomäus, Jenny, Sven Wunderlich, and Zoltán Sasvári. "Identification of Suspicious Semiconductor Devices Using Independent Component Analysis with Dimensionality Reduction." Institute of Electrical and Electronics Engineers (IEEE), 2019. https://tud.qucosa.de/id/qucosa%3A35129.

Full text
Abstract:
In the semiconductor industry the reliability of devices is of paramount importance. Therefore, after removing the defective ones, one wants to detect irregularities in measurement data because corresponding devices have a higher risk of failure early in the product lifetime. The paper presents a method to improve the detection of such suspicious devices where the screening is made on transformed measurement data. Thereby, e.g., dependencies between tests can be taken into account. Additionally, a new dimensionality reduction is performed within the transformation, so that the reduced and transformed data comprises only the informative content from the raw data. This simplifies the complexity of the subsequent screening steps. The new approach will be applied to semiconductor measurement data and it will be shown, by means of examples, how the screening can be improved.
APA, Harvard, Vancouver, ISO, and other styles
12

Landgraf, Andrew J. "Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1437610558.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Cheriyadat, Anil Meerasa. "Limitations of principal component analysis for dimensionality-reduction for classification of hyperspectral data." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-11072003-133109.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Lin, Huang-De Hennessy. "Parametric projection pursuits for dimensionality reduction of hyperspectral signals in target recognition applications." Master's thesis, Mississippi State : Mississippi State University, 2004. http://library.msstate.edu/etd/show.asp?etd=etd-12162003-202048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Gorrell, Genevieve. "Generalized Hebbian Algorithm for Dimensionality Reduction in Natural Language Processing." Doctoral thesis, Linköping : Department of Computer and Information Science, Linköpings universitet, 2006. http://www.bibl.liu.se/liupubl/disp/disp2006/tek1045s.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Rusch, Thomas, Kurt Hornik, and Patrick Mair. "Assessing and quantifying clusteredness: The OPTICS Cordillera." Taylor & Francis, 2018. http://epub.wu.ac.at/5725/1/10618600.2017.pdf.

Full text
Abstract:
This article provides a framework for assessing and quantifying "clusteredness" of a data representation. Clusteredness is a global univariate property defined as a layout diverging from equidistance of points to the closest neighboring point set. The OPTICS algorithm encodes the global clusteredness as a pair of clusteredness-representative distances and an algorithmic ordering. We use this to construct an index for quantification of clusteredness, coined the OPTICS Cordillera, as the norm of subsequent differences over the pair. We provide lower and upper bounds and a normalization for the index. We show the index captures important aspects of clusteredness such as cluster compactness, cluster separation, and number of clusters simultaneously. The index can be used as a goodness-of-clusteredness statistic, as a function over a grid or to compare different representations. For illustration, we apply our suggestion to dimensionality reduced 2D representations of Californian counties with respect to 48 climate change related variables. Online supplementary material is available (including an R package, the data and additional mathematical details).
APA, Harvard, Vancouver, ISO, and other styles
17

Kliegr, Tomáš. "Clickstream Analysis." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-2065.

Full text
Abstract:
Thesis introduces current research trends in clickstream analysis and proposes a new heuristic that could be used for dimensionality reduction of semantically enriched data in Web Usage Mining (WUM). Click-fraud and conversion fraud are identified as key prospective application areas for WUM. Thesis documents a conversion fraud vulnerability of Google Analytics and proposes defense - a new clickstream acquisition software, which collects data in sufficient granularity and structure to allow for data mining approaches to fraud detection. Three variants of K-means clustering algorithms and three association rule data mining systems are evaluated and compared on real-world web usage data.
APA, Harvard, Vancouver, ISO, and other styles
18

Galbincea, Nicholas D. "Critical Analysis of Dimensionality Reduction Techniques and Statistical Microstructural Descriptors for Mesoscale Variability Quantification." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1500642043518197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Wang, Xuechuan, and n/a. "Feature Extraction and Dimensionality Reduction in Pattern Recognition and Their Application in Speech Recognition." Griffith University. School of Microelectronic Engineering, 2003. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20030619.162803.

Full text
Abstract:
Conventional pattern recognition systems have two components: feature analysis and pattern classification. Feature analysis is achieved in two steps: parameter extraction step and feature extraction step. In the parameter extraction step, information relevant for pattern classification is extracted from the input data in the form of parameter vector. In the feature extraction step, the parameter vector is transformed to a feature vector. Feature extraction can be conducted independently or jointly with either parameter extraction or classification. Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are the two popular independent feature extraction algorithms. Both of them extract features by projecting the parameter vectors into a new feature space through a linear transformation matrix. But they optimize the transformation matrix with different intentions. PCA optimizes the transformation matrix by finding the largest variations in the original feature space. LDA pursues the largest ratio of between-class variation and within-class variation when projecting the original feature space to a subspace. The drawback of independent feature extraction algorithms is that their optimization criteria are different from the classifier’s minimum classification error criterion, which may cause inconsistency between feature extraction and the classification stages of a pattern recognizer and consequently, degrade the performance of classifiers. A direct way to overcome this problem is to conduct feature extraction and classification jointly with a consistent criterion. Minimum classification Error (MCE) training algorithm provides such an integrated framework. MCE algorithm was first proposed for optimizing classifiers. It is a type of discriminative learning algorithm but achieves minimum classification error directly. The flexibility of the framework of MCE algorithm makes it convenient to conduct feature extraction and classification jointly. Conventional feature extraction and pattern classification algorithms, LDA, PCA, MCE training algorithm, minimum distance classifier, likelihood classifier and Bayesian classifier, are linear algorithms. The advantage of linear algorithms is their simplicity and ability to reduce feature dimensionalities. However, they have the limitation that the decision boundaries generated are linear and have little computational flexibility. SVM is a recently developed integrated pattern classification algorithm with non-linear formulation. It is based on the idea that the classification that a.ords dot-products can be computed efficiently in higher dimensional feature spaces. The classes which are not linearly separable in the original parametric space can be linearly separated in the higher dimensional feature space. Because of this, SVM has the advantage that it can handle the classes with complex nonlinear decision boundaries. However, SVM is a highly integrated and closed pattern classification system. It is very difficult to adopt feature extraction into SVM’s framework. Thus SVM is unable to conduct feature extraction tasks. This thesis investigates LDA and PCA for feature extraction and dimensionality reduction and proposes the application of MCE training algorithms for joint feature extraction and classification tasks. A generalized MCE (GMCE) training algorithm is proposed to mend the shortcomings of the MCE training algorithms in joint feature and classification tasks. SVM, as a non-linear pattern classification system is also investigated in this thesis. A reduced-dimensional SVM (RDSVM) is proposed to enable SVM to conduct feature extraction and classification jointly. All of the investigated and proposed algorithms are tested and compared firstly on a number of small databases, such as Deterding Vowels Database, Fisher’s IRIS database and German’s GLASS database. Then they are tested in a large-scale speech recognition experiment based on TIMIT database.
APA, Harvard, Vancouver, ISO, and other styles
20

Lewandowski, Michal. "Advanced non linear dimensionality reduction methods for multidimensional time series : applications to human motion analysis." Thesis, Kingston University, 2011. http://eprints.kingston.ac.uk/20313/.

Full text
Abstract:
This dissertation contributes to the state of the art in the field of pattern recognition and machine learning by advancing a family of nonlinear dimensionality reduction methods. We start with the automatisation of spectral dimensionality reduction approaches in order to facilitate the usage of these techniques by scientists in various domains wherever there is a need to explore large volumes of multivariate data. Then, we focus on the crucial and open problem of modelling the intrinsic structure of multidimensional time series. Solutions to this outstanding scientific challenge would advance various branches of science from meteorology, biology, engineering to computer vision, wherever time is a key asset of high dimensional data. We introduce two different approaches to this complex problem, which are both derived from the proposed concept of introducing spatio-temporal constraints between time series. The first algorithm allows for an efficient deterministic parameterisation of multidimensional time series spaces, even in the presence of data variations, whereas the second one approximates an underlying distribution of such spaces in a generative manner. We evaluate our original contributions in the area of visual human motion analysis, especially in two major computer vision tasks, i. e. human body pose estimation and human action recognition from video. In particular, we propose two variants of temporally constrained human motion descriptors, which become a foundation of view independent action recognition frameworks, and demonstrate excellent robustness against style, view and speed variability in recognition of different kinds of motions. Performance analysis confirms the strength and potential of our contributions, which may benefit many domains beyond computer vision.
APA, Harvard, Vancouver, ISO, and other styles
21

Bird, Gregory David. "Linear and Nonlinear Dimensionality-Reduction-Based Surrogate Models for Real-Time Design Space Exploration of Structural Responses." BYU ScholarsArchive, 2020. https://scholarsarchive.byu.edu/etd/8653.

Full text
Abstract:
Design space exploration (DSE) is a tool used to evaluate and compare designs as part of the design selection process. While evaluating every possible design in a design space is infeasible, understanding design behavior and response throughout the design space may be accomplished by evaluating a subset of designs and interpolating between them using surrogate models. Surrogate modeling is a technique that uses low-cost calculations to approximate the outcome of more computationally expensive calculations or analyses, such as finite element analysis (FEA). While surrogates make quick predictions, accuracy is not guaranteed and must be considered. This research addressed the need to improve the accuracy of surrogate predictions in order to improve DSE of structural responses. This was accomplished by performing comparative analyses of linear and nonlinear dimensionality-reduction-based radial basis function (RBF) surrogate models for emulating various FEA nodal results. A total of four dimensionality reduction methods were investigated, namely principal component analysis (PCA), kernel principal component analysis (KPCA), isometric feature mapping (ISOMAP), and locally linear embedding (LLE). These methods were used in conjunction with surrogate modeling to predict nodal stresses and coordinates of a compressor blade. The research showed that using an ISOMAP-based dual-RBF surrogate model for predicting nodal stresses decreased the estimated mean error of the surrogate by 35.7% compared to PCA. Using nonlinear dimensionality-reduction-based surrogates did not reduce surrogate error for predicting nodal coordinates. A new metric, the manifold distance ratio (MDR), was introduced to measure the nonlinearity of the data manifolds. When applied to the stress and coordinate data, the stress space was found to be more nonlinear than the coordinate space for this application. The upfront training cost of the nonlinear dimensionality-reduction-based surrogates was larger than that of their linear counterparts but small enough to remain feasible. After training, all the dual-RBF surrogates were capable of making real-time predictions. This same process was repeated for a separate application involving the nodal displacements of mode shapes obtained from a FEA modal analysis. The modal assurance criterion (MAC) calculation was used to compare the predicted mode shapes, as well as their corresponding true mode shapes obtained from FEA, to a set of reference modes. The research showed that two nonlinear techniques, namely LLE and KPCA, resulted in lower surrogate error in the more complex design spaces. Using a RBF kernel, KPCA achieved the largest average reduction in error of 13.57%. The results also showed that surrogate error was greatly affected by mode shape reversal. Four different approaches of identifying reversed mode shapes were explored, all of which resulted in varying amounts of surrogate error. Together, the methods explored in this research were shown to decrease surrogate error when performing DSE of a turbomachine compressor blade. As surrogate accuracy increases, so does the ability to correctly make engineering decisions and judgements throughout the design process. Ultimately, this will help engineers design better turbomachines.
APA, Harvard, Vancouver, ISO, and other styles
22

Guillemard, Mijail [Verfasser], and Armin [Akademischer Betreuer] Iske. "Some Geometrical and Topological Aspects of Dimensionality Reduction in Signal Analysis / Mijail Guillemard. Betreuer: Armin Iske." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2012. http://d-nb.info/1022196510/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Guillemard, Mijail Verfasser], and Armin [Akademischer Betreuer] [Iske. "Some Geometrical and Topological Aspects of Dimensionality Reduction in Signal Analysis / Mijail Guillemard. Betreuer: Armin Iske." Hamburg : Staats- und Universitätsbibliothek Hamburg, 2012. http://nbn-resolving.de/urn:nbn:de:gbv:18-56358.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Shenoy, A. "Computational analysis of facial expressions." Thesis, University of Hertfordshire, 2010. http://hdl.handle.net/2299/4359.

Full text
Abstract:
This PhD work constitutes a series of inter-disciplinary studies that use biologically plausible computational techniques and experiments with human subjects in analyzing facial expressions. The performance of the computational models and human subjects in terms of accuracy and response time are analyzed. The computational models process images in three stages. This includes: Preprocessing, dimensionality reduction and Classification. The pre-processing of face expression images includes feature extraction and dimensionality reduction. Gabor filters are used for feature extraction as they are closest biologically plausible computational method. Various dimensionality reduction methods: Principal Component Analysis (PCA), Curvilinear Component Analysis (CCA) and Fisher Linear Discriminant (FLD) are used followed by the classification by Support Vector Machines (SVM) and Linear Discriminant Analysis (LDA). Six basic prototypical facial expressions that are universally accepted are used for the analysis. They are: angry, happy, fear, sad, surprise and disgust. The performance of the computational models in classifying each expression category is compared with that of the human subjects. The Effect size and Encoding face enable the discrimination of the areas of the face specific for a particular expression. The Effect size in particular emphasizes the areas of the face that are involved during the production of an expression. This concept of using Effect size on faces has not been reported previously in the literature and has shown very interesting results. The detailed PCA analysis showed the significant PCA components specific for each of the six basic prototypical expressions. An important observation from this analysis was that with Gabor filtering followed by non linear CCA for dimensionality reduction, the dataset vector size may be reduced to a very small number, in most cases it was just 5 components. The hypothesis that the average response time (RT) for the human subjects in classifying the different expressions is analogous to the distance measure of the data points from the classification hyper-plane was verified. This means the harder a facial expression is to classify by human subjects, the closer to the classifying hyper-plane of the classifier it is. A bi-variate correlation analysis of the distance measure and the average RT suggested a significant anti-correlation. The signal detection theory (SDT) or the d-prime determined how well the model or the human subjects were in making the classification of an expressive face from a neutral one. On comparison, human subjects are better in classifying surprise, disgust, fear, and sad expressions. The RAW computational model is better able to distinguish angry and happy expressions. To summarize, there seems to some similarities between the computational models and human subjects in the classification process.
APA, Harvard, Vancouver, ISO, and other styles
25

Silva, Sérgio Montazzolli. "Redução de dimensionalidade aplicada à diarização de locutor." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/94745.

Full text
Abstract:
Atualmente existe uma grande quantidade de dados multimídia sendo geradas todos os dias. Estes dados são oriundos de diversas fontes, como transmissões de rádio ou televisão, gravações de palestras, encontros, conversas telefônicas, vídeos e fotos capturados por celular, entre outros. Com isto, nos últimos anos o interesse pela transcrição de dados multimídia tem crescido, onde, no processamento de voz, podemos destacar as áreas de Reconhecimento de Locutor, Reconhecimento de Fala, Diarização de Locutor e Rastreamento de Locutores. O desenvolvimento destas áreas vem sendo impulsionado e direcionado pelo NIST, que periodicamente realiza avaliações sobre o estado-da-arte. Desde 2000, a tarefa de Diarização de Locutor tem se destacado como uma das principáis frentes de pesquisa em transcrição de dados de voz, tendo sido avaliada pelo NIST por diversas vezes na última década. O objetivo desta tarefa é encontrar o número de locutores presentes em um áudio, e rotular seus respectivos trechos de fala, sem que nenhuma informação tenha sido previamente fornecida. Em outras palavras, costuma-se dizer que o objetivo é responder a questão "Quem falou e quando?". Um dos grandes problemas nesta área é se conseguir obter um bom modelo para cada locutor presente no áudio, dada a pouca quantidade de informações e a alta dimensionalidade dos dados. Neste trabalho, além da criação de um Sistema de Diarização de Locutor, iremos tratar este problema mediante à redução de dimensionalidade através de análises estatísticas. Usaremos a Análise de Componentes Principáis, a Análise de Discriminantes Lineares e a recém apresentada Análise de Semi-Discriminantes Lineares. Esta última utiliza um método de inicialização estático, iremos propor o uso de um método dinâmico, através da detecção de pontos de troca de locutor. Também investigaremos o comportamento destas análises sob o uso simultâneo de múltiplas parametrizações de curto prazo do sinal acústico. Os resultados obtidos mostram que é possível preservar - ou até melhorar - o desempenho do sistema, mesmo reduzindo substâncialmente o número de dimensões. Isto torna mais rápida a execução de algoritmos de Aprendizagem de Máquina e reduz a quantidade de memória necessária para armezenar os dados.
Currently, there is a large amount of multimedia data being generated everyday. These data come from various sources, such as radio or television, recordings of lectures and meetings, telephone conversations, videos and photos captured by mobile phone, among others. Because of this, interest in automatic multimedia data transcription has grown in recent years, where, for voice processing, we can highlight the areas of Speaker Recognition, Speech Recognition, Speaker Diarization and Speaker Tracking. The development of such areas is being conducted by NIST, which periodically promotes state-of-the-art evaluations. Since 2000, the task of Speaker Diarization has emerged as one of the main research fields in voice data transcription, having been evaluated by NIST several times in the last decade. The objective of this task is to find the number of speakers in an audio recording, and properly label their speech segments without the use of any training information. In other words , it is said that the goal of Speaker Diarization is to answer the question "Who spoke when?". A major problem in this area is to obtain a good speaker model from the audio, given the limited amount of information available and the high dimensionality of the data. In the current work, we will describe how our Speaker Diarization System was built, and we will address the problem mentioned by lowering the dimensionality of the data through statistical analysis. We will use the Principal Component Analysis, the Linear Discriminant Analysis and the newly presented Fisher Linear Semi-Discriminant Analysis. The latter uses a static method for initialization, and here we propose the use of a dynamic method by the use of a speaker change points detection algorithm. We also investigate the behavior of these data analysis techniques under the simultaneous use of multiple short term features. Our results show that it is possible to maintain - and even improve - the system performance, by substantially reducing the number of dimensions. As a consequence, the execution of Machine Learning algorithms is accelerated while reducing the amount of memory required to store the data.
APA, Harvard, Vancouver, ISO, and other styles
26

Todorov, Hristo [Verfasser]. "Pattern analysis, dimensionality reduction and hypothesis testing in high-dimensional data from animal studies with small sample sizes / Hristo Todorov." Mainz : Universitätsbibliothek der Johannes Gutenberg-Universität Mainz, 2020. http://d-nb.info/1224895347/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Berguin, Steven Henri. "A method for reducing dimensionality in large design problems with computationally expensive analyses." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53504.

Full text
Abstract:
Strides in modern computational fluid dynamics and leaps in high-power computing have led to unprecedented capabilities for handling large aerodynamic problem. In particular, the emergence of adjoint design methods has been a break-through in the field of aerodynamic shape optimization. It enables expensive, high-dimensional optimization problems to be tackled efficiently using gradient-based methods in CFD; a task that was previously inconceivable. However, adjoint design methods are intended for gradient-based optimization; the curse of dimensionality is still very much alive when it comes to design space exploration, where gradient-free methods cannot be avoided. This research describes a novel approach for reducing dimensionality in large, computationally expensive design problems to a point where gradient-free methods become possible. This is done using an innovative application of Principal Component Analysis (PCA), where the latter is applied to the gradient distribution of the objective function; something that had not been done before. This yields a linear transformation that maps a high-dimensional problem onto an equivalent low-dimensional subspace. None of the original variables are discarded; they are simply linearly combined into a new set of variables that are fewer in number. The method is tested on a range of analytical functions, a two-dimensional staggered airfoil test problem and a three-dimensional Over-Wing Nacelle (OWN) integration problem. In all cases, the method performed as expected and was found to be cost effective, requiring only a relatively small number of samples to achieve large dimensionality reduction.
APA, Harvard, Vancouver, ISO, and other styles
28

Turtinen, M. (Markus). "Learning and recognizing texture characteristics using local binary patterns." Doctoral thesis, University of Oulu, 2007. http://urn.fi/urn:isbn:9789514285028.

Full text
Abstract:
Abstract Texture plays an important role in numerous computer vision applications. Many methods for describing and analyzing of textured surfaces have been proposed. Variations in the appearance of texture caused by changing illumination and imaging conditions, for example, set high requirements on different analysis methods. In addition, real-world applications tend to produce a great deal of complex texture data to be processed that should be handled effectively in order to be exploited. A local binary pattern (LBP) operator offers an efficient way of analyzing textures. It has a simple theory and combines properties of structural and statistical texture analysis methods. LBP is invariant against monotonic gray-scale variations and has also extensions to rotation invariant texture analysis. Analysis of real-world texture data is typically very laborious and time consuming. Often there is no ground truth or other prior knowledge of the data available, and important properties of the textures must be learned from the images. This is a very challenging task in texture analysis. In this thesis, methods for learning and recognizing texture categories using local binary pattern features are proposed. Unsupervised clustering and dimensionality reduction methods combined to visualization provide useful tools for analyzing texture data. Uncovering the data structures is done in an unsupervised fashion, based only on texture features, and no prior knowledge of the data, for example texture classes, is required. In this thesis, non-linear dimensionality reduction, data clustering and visualization are used for building a labeled training set for a classifier, and for studying the performance of the features. The thesis also proposes a multi-class approach to learning and labeling part based texture appearance models to be used in scene texture recognition using only little human interaction. Also a semiautomatic approach to learning texture appearance models for view based texture classification is proposed. The goal of texture characterization is often to classify textures into different categories. In this thesis, two texture classification systems suitable for different applications are proposed. First, a discriminative classifier that combines local and contextual texture information of the image in scene recognition is proposed. Secondly, a real-time capable texture classifier with a self-intuitive user interface to be used in industrial texture classification is proposed. Two challenging real-world texture analysis applications are used to study the performance and usefulness of the proposed methods. The first one is visual paper analysis which aims to characterize paper quality based on texture properties. The second application is outdoor scene image analysis where texture information is used to recognize different regions in the scenes.
APA, Harvard, Vancouver, ISO, and other styles
29

Kanneganti, Raghuveer. "CLASSIFICATION OF ONE-DIMENSIONAL AND TWO-DIMENSIONAL SIGNALS." OpenSIUC, 2014. https://opensiuc.lib.siu.edu/dissertations/892.

Full text
Abstract:
This dissertation focuses on the classification of one-dimensional and two-dimensional signals. The one-dimensional signal classification problem involves the classification of brain signals for identifying the emotional responses of human subjects under given drug conditions. A strategy is developed to accurately classify ERPs in order to identify human emotions based on brain reactivity to emotional, neutral, and cigarette-related stimuli in smokers. A multichannel spatio-temporal model is employed to overcome the curse of dimensionality that plagues the design of parametric multivariate classifiers for multi-channel ERPs. The strategy is tested on the ERPs of 156 smokers who participated in a smoking cessation program. One half of the subjects were given nicotine patches and the other half were given placebo patches. ERPs were collected from 29 channel in response to the presentation of the pictures with emotional (pleasant and unpleasant), neutral/boring, and cigarette-related content. It is shown that human emotions can be classified accurately and the results also show that smoking cessation causes a drop in the classification accuracies of emotions in the placebo group, but not in the nicotine patch group. Given that individual brain patterns were compared with group average brain patterns, the findings support the view that individuals tend to have similar brain reactions to different types of emotional stimuli. Overall, this new classification approach to identify differential brain responses to different emotional types could lead to new knowledge concerning brain mechanisms associated with emotions common to most or all people. This novel classification technique for identifying emotions in the present study suggests that smoking cessation without nicotine replacement results in poorer differentiation of brain responses to different emotional stimuli. Future, directions in this area would be to use these methods to assess individual differences in responses to emotional stimuli and to different drug treatments. Advantages of this and other brain-based assessment include temporal precision (e.g, 400-800 ms post stimulus), and the elimination of biases related to self-report measures. The two-dimensional signal classification problems include the detection of graphite in testing documents and the detection of fraudulent bubbles in test sheets. A strategy is developed to detect graphite responses in optical mark recognition (OMR) documents using inexpensive visible light scanners. The main challenge in the formulation of the strategy is that the detection should be invariant to the numerous background colors and artwork in typical optical mark recognition documents. A test document is modeled as a superposition of a graphite response image and a background image. The background image in turn is modeled as superposition of screening artwork, lines, and machine text components. A sequence of image processing operations and a pattern recognition algorithm are developed to estimate the graphite response image from a test document by systematically removing the components of the background image. The proposed strategy is tested on a wide range of scanned documents and it is shown that the estimated graphite response images are visually similar to those scanned by very expensive infra-red scanners currently employed for optical mark recognition. The robustness of the detection strategy is also demonstrated by testing a large number of simulated test documents. A procedure is also developed to autonomously determine if cheating has occurred by detecting the presence of aberrant responses in scanned OMR test books. The challenges introduced by the significant imbalance in the numbers of typical and aberrant bubbles were identified. The aberrant bubble detection problem is formulated as an outlier detection problem. A feature based outlier detection procedure in conjunction with a one-class SVM classifier is developed. A multi-criteria rank-of-rank-sum technique is introduced to rank and select a subset of features from a pool of candidate features. Using the data set of 11 individuals, it is shown that a detection accuracy of over 90% is possible. Experiments conducted on three real test books flagged for suspected cheating showed that the proposed strategy has the potential to be deployed in practice.
APA, Harvard, Vancouver, ISO, and other styles
30

Chao, Roger. "Data analysis for Systematic Literature Reviews." Thesis, Linnéuniversitetet, Institutionen för informatik (IK), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-105122.

Full text
Abstract:
Systematic Literature Reviews (SLR) are a powerful research tool to identify and select literature to answer a certain question. However, an approach to extract inherent analytical data in Systematic Literature Reviews’ multi-dimensional datasets was lacking. Previous Systematic Literature Review tools do not incorporate the capability of providing said analytical insight. Therefore, this thesis aims to provide a useful approach comprehending various algorithms and data treatment techniques to provide the user with analytical insight on their data that is not evident in the bare execution of a Systematic Literature Review. For this goal, a literature review has been conducted to find the most relevant techniques to extract data from multi-dimensional data sets and the aforementioned approach has been tested on a survey regarding Self-Adaptive Systems (SAS) using a web-application. As a result, we find out what are the most adequate techniques to incorporate into the approach this thesis will provide.
APA, Harvard, Vancouver, ISO, and other styles
31

Chen, Beichen, and Amy Jinxin Chen. "PCA based dimensionality reduction of MRI images for training support vector machine to aid diagnosis of bipolar disorder." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259621.

Full text
Abstract:
This study aims to investigate how dimensionality reduction of neuroimaging data prior to training support vector machines (SVMs) affects the classification accuracy of bipolar disorder. This study uses principal component analysis (PCA) for dimensionality reduction. An open source data set of 19 bipolar and 31 control structural magnetic resonance imaging (sMRI) samples was used, part of the UCLA Consortium for Neuropsychiatric Phenomics LA5c Study funded by the NIH Roadmap Initiative aiming to foster breakthroughs in the development of novel treatments for neuropsychiatric disorders. The images underwent smoothing, feature extraction and PCA before they were used as input to train SVMs. 3-fold cross-validation was used to tune a number of hyperparameters for linear, radial, and polynomial kernels. Experiments were done to investigate the performance of SVM models trained using 1 to 29 principal components (PCs). Several PC sets reached 100% accuracy in the final evaluation, with the minimal set being the first two principal components. Accumulated variance explained by the PCs used did not have a correlation with the performance of the model. The choice of kernel and hyperparameters is of utmost importance as the performance obtained can vary greatly. The results support previous studies that SVM can be useful in aiding the diagnosis of bipolar disorder, and that the use of PCA as a dimensionality reduction method in combination with SVM may be appropriate for the classification of neuroimaging data for illnesses not limited to bipolar disorder. Due to the limitation of a small sample size, the results call for future research using larger collaborative data sets to validate the accuracies obtained.
Syftet med denna studie är att undersöka hur dimensionalitetsreduktion av neuroradiologisk data före träning av stödvektormaskiner (SVMs) påverkar klassificeringsnoggrannhet av bipolär sjukdom. Studien använder principalkomponentanalys (PCA) för dimensionalitetsreduktion. En datauppsättning av 19 bipolära och 31 friska magnetisk resonanstomografi(MRT) bilder användes, vilka tillhör den öppna datakällan från studien UCLA Consortium for Neuropsychiatric Phenomics LA5c som finansierades av NIH Roadmap Initiative i syfte att främja genombrott i utvecklingen av nya behandlingar för neuropsykiatriska funktionsnedsättningar. Bilderna genomgick oskärpa, särdragsextrahering och PCA innan de användes som indata för att träna SVMs. Med 3-delad korsvalidering inställdes ett antal parametrar för linjära, radiala och polynomiska kärnor. Experiment gjordes för att utforska prestationen av SVM-modeller tränade med 1 till 29 principalkomponenter (PCs). Flera PC uppsättningar uppnådde 100% noggrannhet i den slutliga utvärderingen, där den minsta uppsättningen var de två första PCs. Den ackumulativa variansen över antalet PCs som användes hade inte någon korrelation med prestationen på modellen. Valet av kärna och hyperparametrar är betydande eftersom prestationen kan variera mycket. Resultatet stödjer tidigare studier att SVM kan vara användbar som stöd för diagnostisering av bipolär sjukdom och användningen av PCA som en dimensionalitetsreduktionsmetod i kombination med SVM kan vara lämplig för klassificering av neuroradiologisk data för bipolär och andra sjukdomar. På grund av begränsningen med få dataprover, kräver resultaten framtida forskning med en större datauppsättning för att validera de erhållna noggrannheten.
APA, Harvard, Vancouver, ISO, and other styles
32

Abdel-Rahman, Tarek. "Mixture of Factor Analyzers (MoFA) Models for the Design and Analysis of SAR Automatic Target Recognition (ATR) Algorithms." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1500625807524146.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Ivan, Jean-Paul. "Principal Component Modelling of Fuel Consumption ofSeagoing Vessels and Optimising Fuel Consumption as a Mixed-Integer Problem." Thesis, Mälardalens högskola, Akademin för utbildning, kultur och kommunikation, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-51847.

Full text
Abstract:
The fuel consumption of a seagoing vessel is, through a combination of Box-Cox transforms and principal component analysis, reduced to a univariatefunction of the primary principle component with mean model error −3.2%and error standard deviation 10.3%. In the process, a Latin-hypercube-inspired space partitioning sampling technique is developed and successfully used to produce a representative sampleused in determining the regression coefficients. Finally, a formal optimisation problem for minimising the fuel use is described. The problem is derived from a parametrised expression for the fuel consumption, and has only 3, or 2 if simplified, free variables at each timestep. Some information has been redacted in order to comply with NDA restrictions. Most redactions are either names (of vessels or otherwise), units, andin some cases (especially on figures) quantities.

Presentation was performed remotely using Zoom.

APA, Harvard, Vancouver, ISO, and other styles
34

Gao, Hui. "Extracting key features for analysis and recognition in computer vision." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1141770523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Wang, Xianwang. "Single View Reconstruction for Human Face and Motion with Priors." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_diss/62.

Full text
Abstract:
Single view reconstruction is fundamentally an under-constrained problem. We aim to develop new approaches to model human face and motion with model priors that restrict the space of possible solutions. First, we develop a novel approach to recover the 3D shape from a single view image under challenging conditions, such as large variations in illumination and pose. The problem is addressed by employing the techniques of non-linear manifold embedding and alignment. Specifically, the local image models for each patch of facial images and the local surface models for each patch of 3D shape are learned using a non-linear dimensionality reduction technique, and the correspondences between these local models are then learned by a manifold alignment method. Local models successfully remove the dependency of large training databases for human face modeling. By combining the local shapes, the global shape of a face can be reconstructed directly from a single linear system of equations via least square. Unfortunately, this learning-based approach cannot be successfully applied to the problem of human motion modeling due to the internal and external variations in single view video-based marker-less motion capture. Therefore, we introduce a new model-based approach for capturing human motion using a stream of depth images from a single depth sensor. While a depth sensor provides metric 3D information, using a single sensor, instead of a camera array, results in a view-dependent and incomplete measurement of object motion. We develop a novel two-stage template fitting algorithm that is invariant to subject size and view-point variations, and robust to occlusions. Starting from a known pose, our algorithm first estimates a body configuration through temporal registration, which is used to search the template motion database for a best match. The best match body configuration as well as its corresponding surface mesh model are deformed to fit the input depth map, filling in the part that is occluded from the input and compensating for differences in pose and body-size between the input image and the template. Our approach does not require any makers, user-interaction, or appearance-based tracking. Experiments show that our approaches can achieve good modeling results for human face and motion, and are capable of dealing with variety of challenges in single view reconstruction, e.g., occlusion.
APA, Harvard, Vancouver, ISO, and other styles
36

Henriksson, William. "High dimensional data clustering; A comparative study on gene expressions : Experiment on clustering algorithms on RNA-sequence from tumors with evaluation on internal validation." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-17492.

Full text
Abstract:
In cancer research, class discovery is the first process for investigating a new dataset for which hidden groups there are by similar attributes. However datasets from gene expressions, RNA microarray or RNA-sequence, are high-dimensional. Which makes it hard to perform clusteranalysis and to get clusters that are well separated. Well separated clusters are wanted because that tells that objects are most likely not placed in wrong clusters. This report investigate in an experiment whether using K-Means and hierarchical are suitable for clustering gene expressions in RNA-sequence data from various tumors. Dimensionality reduction methods are also applied to see whether that helps create well-separated clusters. The results tell that well separated clusters are only achieved by using PCA as dimensionality reduction and K-Means on correlation. The main contribution of this paper is determining that using K-Means or hierarchical clustering on the full natural dimensionality of RNA-sequence data returns unwanted silhouette average width, under 0,4.
APA, Harvard, Vancouver, ISO, and other styles
37

Mappus, Rudolph Louis IV. "Estimating the discriminative power of time varying features for EEG BMI." Diss., Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/31738.

Full text
Abstract:
In this work, we present a set of methods aimed at improving the discriminative power of time-varying features of signals that contain noise. These methods use properties of noise signals as well as information theoretic techniques to factor types of noise and support signal inference for electroencephalographic (EEG) based brain-machine interfaces (BMI). EEG data were collected over two studies aimed at addressing Psychophysiological issues involving symmetry and mental rotation processing. The Psychophysiological data gathered in the mental rotation study also tested the feasibility of using dissociations of mental rotation tasks correlated with rotation angle in a BMI. We show the feasibility of mental rotation for BMI by showing comparable bitrates and recognition accuracy to state-of-the-art BMIs. The conclusion is that by using the feature selection methods introduced in this work to dissociate mental rotation tasks, we produce bitrates and recognition rates comparable to current BMIs.
APA, Harvard, Vancouver, ISO, and other styles
38

Piñal, Moctezuma Juan Fernando. "Characterization of damage evolution on metallic components using ultrasonic non-destructive methods." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/667641.

Full text
Abstract:
When fatigue is considered, it is expected that structures and machinery eventually fail. Still, when this damage is unexpected, besides of the negative economic impact that it produces, life of people could be potentially at risk. Thus, nowadays it is imperative that the infrastructure managers, ought to program regular inspection and maintenance for their assets; in addition, designers and materials manufacturers, can access to appropriate diagnostic tools in order to build superior and more reliable materials. In this regard, and for a number of applications, non-destructive evaluation techniques have proven to be an efficient and helpful alternative to traditional destructive assays of materials. Particularly, for the design area of materials, in recent times researchers have exploited the Acoustic Emission (AE) phenomenon as an additional assessing tool with which characterize the mechanical properties of specimens. Nevertheless, several challenges arise when treat said phenomenon, since its intensity, duration and arrival behavior is essentially stochastic for traditional signal processing means, leading to inaccuracies for the outcome assessment. In this dissertation, efforts are focused on assisting in the characterization of the mechanical properties of advanced high strength steels during under uniaxial tensile tests. Particularly of interest, is being able to detect the nucleation and growth of a crack throughout said test. Therefore, the resulting AE waves generated by the specimen during the test are assessed with the aim of characterize their evolution. For this, on the introduction, a brief review about non-destructive methods emphasizing the AE phenomenon is introduced. Next is presented, an exhaustive analysis with regard to the challenge and deficiencies of detecting and segmenting each AE event over a continuous data-stream with the traditional threshold detection method, and additionally, with current state of the art methods. Following, a novel AE event detection method is proposed, with the aim of overcome the aforementioned limitations. Evidence showed that the proposed method (which is based on the short-time features of the waveform of the AE signal), excels the detection capabilities of current state of the art methods, when onset and endtime precision, as well as when quality of detection and computational speed are also considered. Finally, a methodology aimed to analyze the frequency spectrum evolution of the AE phenomenon during the tensile test, is proposed. Results indicate that it is feasible to correlate nucleation and growth of a crack with the frequency content evolution of AE events.
Cuando se considera la fatiga de los materiales, se espera que eventualmente las estructuras y las maquinarias fallen. Sin embargo, cuando este daño es inesperado, además del impacto económico que este produce, la vida de las personas podría estar potencialmente en riesgo. Por lo que hoy en día, es imperativo que los administradores de las infraestructuras deban programar evaluaciones y mantenimientos de manera regular para sus activos. De igual manera, los diseñadores y fabricantes de materiales deberían de poseer herramientas de diagnóstico apropiadas con el propósito de obtener mejores y más confiables materiales. En este sentido, y para un amplio número de aplicaciones, las técnicas de evaluación no destructivas han demostrado ser una útil y eficiente alternativa a los ensayos destructivos tradicionales de materiales. De manera particular, en el área de diseño de materiales, recientemente los investigadores han aprovechado el fenómeno de Emisión Acústica (EA) como una herramienta complementaria de evaluación, con la cual poder caracterizar las propiedades mecánicas de los especímenes. No obstante, una multitud de desafíos emergen al tratar dicho fenómeno, ya que el comportamiento de su intensidad, duración y aparición es esencialmente estocástico desde el punto de vista del procesado de señales tradicional, conllevando a resultados imprecisos de las evaluaciones. Esta disertación se enfoca en colaborar en la caracterización de las propiedades mecánicas de Aceros Avanzados de Alta Resistencia (AAAR), para ensayos de tracción de tensión uniaxiales, con énfasis particular en la detección de fatiga, esto es la nucleación y generación de grietas en dichos componentes metálicos. Para ello, las ondas mecánicas de EA que estos especímenes generan durante los ensayos, son estudiadas con el objetivo de caracterizar su evolución. En la introducción de este documento, se presenta una breve revisión acerca de los métodos existentes no destructivos con énfasis particular al fenómeno de EA. A continuación, se muestra un análisis exhaustivo respecto a los desafíos para la detección de eventos de EA y las y deficiencias del método tradicional de detección; de manera adicional se evalúa el desempeño de los métodos actuales de detección de EA pertenecientes al estado del arte. Después, con el objetivo de superar las limitaciones presentadas por el método tradicional, se propone un nuevo método de detección de actividad de EA; la evidencia demuestra que el método propuesto (basado en el análisis en tiempo corto de la forma de onda), supera las capacidades de detección de los métodos pertenecientes al estado del arte, cuando se evalúa la precisión de la detección de la llegada y conclusión de las ondas de EA; además de, cuando también se consideran la calidad de detección de eventos y la velocidad de cálculo. Finalmente, se propone una metodología con el propósito de evaluar la evolución de la energía del espectro frecuencial del fenómeno de EA durante un ensayo de tracción; los resultados demuestran que es posible correlacionar el contenido de dicha evolución frecuencial con respecto a la nucleación y crecimiento de grietas en AAAR's.
APA, Harvard, Vancouver, ISO, and other styles
39

Nordqvist, My. "Classify part of day and snow on the load of timber stacks : A comparative study between partitional clustering and competitive learning." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42238.

Full text
Abstract:
In today's society, companies are trying to find ways to utilize all the data they have, which considers valuable information and insights to make better decisions. This includes data used to keeping track of timber that flows between forest and industry. The growth of Artificial Intelligence (AI) and Machine Learning (ML) has enabled the development of ML modes to automate the measurements of timber on timber trucks, based on images. However, to improve the results there is a need to be able to get information from unlabeled images in order to decide weather and lighting conditions. The objective of this study is to perform an extensive for classifying unlabeled images in the categories, daylight, darkness, and snow on the load. A comparative study between partitional clustering and competitive learning is conducted to investigate which method gives the best results in terms of different clustering performance metrics. It also examines how dimensionality reduction affects the outcome. The algorithms K-means and Kohonen Self-Organizing Map (SOM) are selected for the clustering. Each model is investigated according to the number of clusters, size of dataset, clustering time, clustering performance, and manual samples from each cluster. The results indicate a noticeable clustering performance discrepancy between the algorithms concerning the number of clusters, dataset size, and manual samples. The use of dimensionality reduction led to shorter clustering time but slightly worse clustering performance. The evaluation results further show that the clustering time of Kohonen SOM is significantly higher than that of K-means.
APA, Harvard, Vancouver, ISO, and other styles
40

Krusche, Stefan. "Visualisierung und Analyse multivariater Daten in der gartenbaulichen Beratung -Methodik, Einsatz und Vergleich datenanalytischer Verfahren." Doctoral thesis, Humboldt-Universität zu Berlin, Landwirtschaftlich-Gärtnerische Fakultät, 1999. http://dx.doi.org/10.18452/14463.

Full text
Abstract:
Ausgangspunkt der vorliegenden Arbeit ist die Suche der gartenbaulichen Beratung nach Visualisierungsmöglichkeiten umfangreicher gartenbaulicher Datensätze, die einerseits zu einer graphischen Zusammenfassung der in den Daten enthaltenen Informationen dienen und die andererseits auf interaktivem Weg Möglichkeiten der graphischen Analyse von Erhebungsdaten liefern. Die weitgehende Freiheit von Modellannahmen, der überwiegend deskriptive Charakter der Untersuchungen, das interaktive, schrittweise Vorgehen in der Auswertung, und die Betonung graphischer Elemente kennzeichnet die Arbeit als Beitrag zur explorativen Datenanalyse. Das ausgewählte Methodenspektrum, das ausführlich besprochen wird, schließt Verfahren der Dimensionserniedrigung (Hauptkomponentenanalyse, Korrespondenzanalyse und mehrdimensionale Skalierung) und darauf aufbauende Biplots, die Analyse gruppierter Daten (Prokrustes-Rotation und Gruppenanalysemodelle in der Hauptkomponentenanalyse), Linienverbände (Liniendiagramme der formalen Begriffsanalyse, Baumdiagramme und graphische Modelle), sowie ergänzende graphische Verfahren, wie zum Beispiel Trellis-Displays, ein. Beispielhaft werden eine betriebsbegleitende Untersuchung mit Cyclamen aus der Beratungspraxis der Landwirtschaftskammer Westfalen-Lippe und die Kennzahlen der Jahre 1992 bis 1994 der Topfpflanzenbetriebe des Arbeitskreises für Betriebswirtschaft im Gartenbau aus Hannover analysiert. Neben einer Vielzahl informativer Einzelergebnisse, zeigt die Arbeit auch auf, daß die qualitativ relativ schlechten Datengrundlagen nur selten eindeutige Schlußfolgerungen zulassen. Sie sensibilisiert also in diesem Bereich für die Problematik, die der explorativen Analyse wenig perfekter Daten innewohnt. Als besonders sinnvolle Hilfsmittel in der graphischen Analyse erweisen sich Biplots, hierarchische Liniendiagramme und Trellis-Displays. Die Segmentierung einer Vielzahl von Objekten in einzelne Gruppen wird durch Klassifikations- und Regressionsbäume vor allem unter dem Gesichtspunkt der Visualisierung gut gelöst, da den entstehenden Baumstrukturen auch die die Segmente bestimmenden Variablen visuell entnommen werden können. Diskrete graphische Modelle bieten schließlich einen guten Ansatzpunkt zur Analyse von multivariaten Beziehungszusammenhängen. Einzelne, nicht in der statistischen Standardsoftware vorhandene Prozeduren sind in eigens erstellten Programmcodes zusammengefaßt und können mit dem Programm Genstat genutzt werden.
In order to interpret large data sets in the context of consultancy and extension in horticulture, this thesis attempts to find ways to visually explore horticultural multivariate data, in order to obtain a concise description and summary of the information available in the data and moreover develop possibilities to interactively analyse survey data. The thesis is part of an exploratory data analysis which analyses data without making specific model assumptions, is predominantly descriptive, analyses data step by step in a highly interactive setting, and makes full use of all kinds of graphical displays. The methods used comprise various dimensionality reduction techniques (principal components analysis, correspondence analysis, multidimensional scaling), biplots, the multivariate analysis of grouped data (procrustes rotation and groupwise principal components), graphical models, CART, and line diagrams of formal concept analysis. In addition, further graphical methods are used, like e.g. trellis displays. Data from an on-site investigation of the production process of Cyclamen in 20 nurseries and from the microeconomics indicators of 297 growers in Germany (so called Kennzahlen) from the years 1992 to 1994 are used to demonstrate the analytical capabilities of the methods used. The data present a perfect example of unperfect data, and therefore represent the majority of the data sets that horticultural consultancy has to work with. Thus, it becomes clear, that despite the variety of results, which helps to enhance the understanding of the data at hand, not only the complexity of the processes observed, but also the low data quality make it fairly difficult to arrive at clear cut conclusions. The most helpful tools in the graphical data analysis are biplots, hierarchical line diagrams and trellis displays. Finding an empirical grouping of objects is best solved by classification and regression trees, which provide both, the data segmentation, and an intuitively appealing visualisation and explanation of the derived groups. In order to understand multivariate relationships better, discrete graphical models are well suited. The procedures to carry out a number of the methods which cannot be found in general statistics packages are provided in the form of Genstat codes.
APA, Harvard, Vancouver, ISO, and other styles
41

Malik, Zeeshan. "Towards on-line domain-independent big data learning : novel theories and applications." Thesis, University of Stirling, 2015. http://hdl.handle.net/1893/22591.

Full text
Abstract:
Feature extraction is an extremely important pre-processing step to pattern recognition, and machine learning problems. This thesis highlights how one can best extract features from the data in an exhaustively online and purely adaptive manner. The solution to this problem is given for both labeled and unlabeled datasets, by presenting a number of novel on-line learning approaches. Specifically, the differential equation method for solving the generalized eigenvalue problem is used to derive a number of novel machine learning and feature extraction algorithms. The incremental eigen-solution method is used to derive a novel incremental extension of linear discriminant analysis (LDA). Further the proposed incremental version is combined with extreme learning machine (ELM) in which the ELM is used as a preprocessor before learning. In this first key contribution, the dynamic random expansion characteristic of ELM is combined with the proposed incremental LDA technique, and shown to offer a significant improvement in maximizing the discrimination between points in two different classes, while minimizing the distance within each class, in comparison with other standard state-of-the-art incremental and batch techniques. In the second contribution, the differential equation method for solving the generalized eigenvalue problem is used to derive a novel state-of-the-art purely incremental version of slow feature analysis (SLA) algorithm, termed the generalized eigenvalue based slow feature analysis (GENEIGSFA) technique. Further the time series expansion of echo state network (ESN) and radial basis functions (EBF) are used as a pre-processor before learning. In addition, the higher order derivatives are used as a smoothing constraint in the output signal. Finally, an online extension of the generalized eigenvalue problem, derived from James Stone’s criterion, is tested, evaluated and compared with the standard batch version of the slow feature analysis technique, to demonstrate its comparative effectiveness. In the third contribution, light-weight extensions of the statistical technique known as canonical correlation analysis (CCA) for both twinned and multiple data streams, are derived by using the same existing method of solving the generalized eigenvalue problem. Further the proposed method is enhanced by maximizing the covariance between data streams while simultaneously maximizing the rate of change of variances within each data stream. A recurrent set of connections used by ESN are used as a pre-processor between the inputs and the canonical projections in order to capture shared temporal information in two or more data streams. A solution to the problem of identifying a low dimensional manifold on a high dimensional dataspace is then presented in an incremental and adaptive manner. Finally, an online locally optimized extension of Laplacian Eigenmaps is derived termed the generalized incremental laplacian eigenmaps technique (GENILE). Apart from exploiting the benefit of the incremental nature of the proposed manifold based dimensionality reduction technique, most of the time the projections produced by this method are shown to produce a better classification accuracy in comparison with standard batch versions of these techniques - on both artificial and real datasets.
APA, Harvard, Vancouver, ISO, and other styles
42

Bahri, Maroua. "Improving IoT data stream analytics using summarization techniques." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT017.

Full text
Abstract:
Face à cette évolution technologique vertigineuse, l’utilisation des dispositifs de l'Internet des Objets (IdO), les capteurs, et les réseaux sociaux, d'énormes flux de données IdO sont générées quotidiennement de différentes applications pourront être transformées en connaissances à travers l’apprentissage automatique. En pratique, de multiples problèmes se posent afin d’extraire des connaissances utiles de ces flux qui doivent être gérés et traités efficacement. Dans ce contexte, cette thèse vise à améliorer les performances (en termes de mémoire et de temps) des algorithmes de l'apprentissage supervisé, principalement la classification à partir de flux de données en évolution. En plus de leur nature infinie, la dimensionnalité élevée et croissante de ces flux données dans certains domaines rendent la tâche de classification plus difficile. La première partie de la thèse étudie l’état de l’art des techniques de classification et de réduction de dimension pour les flux de données, tout en présentant les travaux les plus récents dans ce cadre.La deuxième partie de la thèse détaille nos contributions en classification pour les flux de données. Il s’agit de nouvelles approches basées sur les techniques de réduction de données visant à réduire les ressources de calcul des classificateurs actuels, presque sans perte en précision. Pour traiter les flux de données de haute dimension efficacement, nous incorporons une étape de prétraitement qui consiste à réduire la dimension de chaque donnée (dès son arrivée) de manière incrémentale avant de passer à l’apprentissage. Dans ce contexte, nous présentons plusieurs approches basées sur: Bayesien naïf amélioré par les résumés minimalistes et hashing trick, k-NN qui utilise compressed sensing et UMAP, et l’utilisation d’ensembles d’apprentissage également
With the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods
APA, Harvard, Vancouver, ISO, and other styles
43

Sánchez, Martínez Sergio. "Multi-feature machine learning analysis for an improved characterization of the cardiac mechanics." Doctoral thesis, Universitat Pompeu Fabra, 2018. http://hdl.handle.net/10803/663748.

Full text
Abstract:
This thesis focuses on the development of machine learning tools to better characterize the cardiac anatomy and function in the context of heart failure, and in particular their extension to consider multiple parameters that help identifying the pathophysiological aspects underlying disease. This advanced and personalized characterization may eventually allow assigning patients to clinically-meaningful phenogroups with a uniform treatment response and/or disease prognosis. Specifically, the thesis copes with the technical difficulties that multivariate analyses imply, paying special attention to properly combine different descriptors that might be of different nature (e.g., patterns, continuous, or categorical variables) and to reduce the complexity of large amounts of data up to a meaningful representation. To this end, we implemented an unsupervised dimensionality reduction technique (Multiple Kernel Learning), which highlights the main characteristics of complex, high-dimensional data into fewer dimensions. For our computational analysis to be useful for the clinical community, it should remain fully interpretable. We made special emphasis in allowing the user to be aware of how the input to the learning process models the obtained output, through the use of multi-scale kernel regression techniques among others.
Esta tesis se centra en el desarrollo de herramientas de aprendizaje automático para mejorar la caracterización de la anatomía y la función cardíaca en el contexto de insuficiencia cardíaca, y, en particular, su extensión para considerar múltiples parámetros que ayuden a identificar los aspectos pato-fisiológicos subyacentes a la enfermedad. Esta caracterización avanzada y personalizada podría en última instancia permitir asignar pacientes a fenogrupos clínicamente relevantes, que demuestren una respuesta uniforme a un determinado tratamiento, o un mismo pronóstico. Específicamente, esta tesis lidia con las dificultades técnicas que implican los análisis multi-variable, prestando especial atención a combinar de forma apropiada diferentes descriptores que pueden ser de diferente naturaleza (por ejemplo, patrones, o variables continuas o categóricas), y reducir la complejidad de grandes cantidades de datos mediante una representación significativa. Con este fin, implementamos una técnica no supervisada de reducción de dimensionalidad (Multiple Kernel Learning), que destaca las principales características de datos complejos y de alta dimensión utilizando un número reducido de dimensiones. Para que nuestro análisis computacional sea útil para la comunidad clínica debería ser enteramente interpretable. Por eso, hemos hecho especial hincapié en permitir que el usuario sea consciente de cómo los datos entrantes al algoritmo de aprendizaje modelan el resultado obtenido mediante el uso de técnicas de regresión kernel multi-escala, entre otras.
APA, Harvard, Vancouver, ISO, and other styles
44

Gertrudes, Jadson Castro. "Emprego de técnicas de análise exploratória de dados utilizados em Química Medicinal." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-14112013-124231/.

Full text
Abstract:
Pesquisas na área de Química Medicinal têm direcionado esforços na busca por métodos que acelerem o processo de descoberta de novos medicamentos. Dentre as diversas etapas relacionadas ao longo do processo de descoberta de substâncias bioativas está a análise das relações entre a estrutura química e a atividade biológica de compostos. Neste processo, os pesquisadores da área de Química Medicinal analisam conjuntos de dados que são caracterizados pela alta dimensionalidade e baixo número de observações. Dentro desse contexto, o presente trabalho apresenta uma abordagem computacional que visa contribuir para a análise de dados químicos e, consequentemente, a descoberta de novos medicamentos para o tratamento de doenças crônicas. As abordagens de análise exploratória de dados, utilizadas neste trabalho, combinam técnicas de redução de dimensionalidade e de agrupamento para detecção de estruturas naturais que reflitam a atividade biológica dos compostos analisados. Dentre as diversas técnicas existentes para a redução de dimensionalidade, são discutidas o escore de Fisher, a análise de componentes principais e a análise de componentes principais esparsas. Quanto aos algoritmos de aprendizado, são avaliados o k-médias, fuzzy c-médias e modelo de misturas ICA aperfeiçoado. No desenvolvimento deste trabalho foram utilizados quatro conjuntos de dados, contendo informações de substâncias bioativas, sendo que dois conjuntos foram relacionados ao tratamento da diabetes mellitus e da síndrome metabólica, o terceiro conjunto relacionado a doenças cardiovasculares e o último conjunto apresenta substâncias que podem ser utilizadas no tratamento do câncer. Nos experimentos realizados, os resultados alcançados sugerem a utilização das técnicas de redução de dimensionalidade juntamente com os algoritmos não supervisionados para a tarefa de agrupamento dos dados químicos, uma vez que nesses experimentos foi possível descrever níveis de atividade biológica dos compostos estudados. Portanto, é possível concluir que as técnicas de redução de dimensionalidade e de agrupamento podem possivelmente ser utilizadas como guias no processo de descoberta e desenvolvimento de novos compostos na área de Química Medicinal.
Researches in Medicinal Chemistry\'s area have focused on the search of methods that accelerate the process of drug discovery. Among several steps related to the process of discovery of bioactive substances there is the analysis of the relationships between chemical structure and biological activity of compounds. In this process, researchers of medicinal chemistry analyze data sets that are characterized by high dimensionality and small number of observations. Within this context, this work presents a computational approach that aims to contribute to the analysis of chemical data and, consequently, the discovery of new drugs for the treatment of chronic diseases. Approaches used in exploratory data analysis, employed in this work, combine techniques of dimensionality reduction and clustering for detecting natural structures that reflect the biological activity of the analyzed compounds. Among several existing techniques for dimensionality reduction, we have focused the Fisher\'s score, principal component analysis and sparse principal component analysis. For the clustering procedure, this study evaluated k-means, fuzzy c-means and enhanced ICA mixture model. In order to perform experiments, we used four data sets, containing information of bioactive substances. Two sets are related to the treatment of diabetes mellitus and metabolic syndrome, the third set is related to cardiovascular disease and the latter set has substances that can be used in cancer treatment. In the experiments, the obtained results suggest the use of dimensionality reduction techniques along with clustering algorithms for the task of clustering chemical data, since from these experiments, it was possible to describe different levels of biological activity of the studied compounds. Therefore, we conclude that the techniques of dimensionality reduction and clustering can be used as guides in the process of discovery and development of new compounds in the field of Medicinal Chemistry
APA, Harvard, Vancouver, ISO, and other styles
45

Bécavin, Christophe. "Dimensionaly reduction and pathway network analysis of transcriptome data : application to T-cell characterization." Paris, Ecole normale supérieure, 2010. http://www.theses.fr/2010ENSUBS02.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Benmoussat, Mohammed Seghir. "Hyperspectral imagery algorithms for the processing of multimodal data : application for metal surface inspection in an industrial context by means of multispectral imagery, infrared thermography and stripe projection techniques." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4347/document.

Full text
Abstract:
Le travail présenté dans cette thèse porte sur l'inspection de surfaces métalliques industrielles. Nous proposons de généraliser des méthodes de l'imagerie hyperspectrale à des données multimodales comme des images optiques multi-canales, et des images thermographiques multi-temporelles. Dans la première application, les cubes de données sont construits à partir d'images multi-composantes pour détecter des défauts de surface. Les meilleures performances sont obtenues avec les éclairages multi-longueurs d'ondes dans le visible et le proche IR, et la détection du défaut en utilisant l'angle spectral, avec le spectre moyen comme référence. La deuxième application concerne l'utilisation de l'imagerie thermique pour l'inspection de pièces métalliques nucléaires afin de détecter des défauts de surface et sub-surface. Une approche 1D est proposée, basée sur l'utilisation du kurtosis pour sélectionner la composante principale parmi les premières obtenues après réduction des données avec l’ACP. La méthode proposée donne de bonnes performances avec des données non-bruitées et homogènes, cependant la SVD avec les algorithmes de détection d'anomalies est très robuste aux perturbations. Finalement, une approche, basée sur les techniques d'analyse de franges et la lumière structurée est présentée, dans le but d'inspecter des surfaces métalliques à forme libre. Après avoir déterminé les paramètres décrivant les modèles de franges sinusoïdaux, l'approche proposée consiste à projeter une liste de motifs déphasés et à calculer l'image de phase des motifs enregistrés. La localisation des défauts est basée sur la détection et l'analyse des franges dans les images de phase
The work presented in this thesis deals with the quality control and inspection of industrial metallic surfaces. The purpose is the generalization and application of hyperspectral imagery methods for multimodal data such as multi-channel optical images and multi-temporal thermographic images. In the first application, data cubes are built from multi-component images to detect surface defects within flat metallic parts. The best performances are obtained with multi-wavelength illuminations in the visible and near infrared ranges, and detection using spectral angle mapper with mean spectrum as a reference. The second application turns on the use of thermography imaging for the inspection of nuclear metal components to detect surface and subsurface defects. A 1D approach is proposed based on using the kurtosis to select 1 principal component (PC) from the first PCs obtained after reducing the original data cube with the principal component analysis (PCA) algorithm. The proposed PCA-1PC method gives good performances with non-noisy and homogeneous data, and SVD with anomaly detection algorithms gives the most consistent results and is quite robust to perturbations such as inhomogeneous background. Finally, an approach based on fringe analysis and structured light techniques in case of deflectometric recordings is presented for the inspection of free-form metal surfaces. After determining the parameters describing the sinusoidal stripe patterns, the proposed approach consists in projecting a list of phase-shifted patterns and calculating the corresponding phase-images. Defect location is based on detecting and analyzing the stripes within the phase-images
APA, Harvard, Vancouver, ISO, and other styles
47

Gao, Huanhuan. "Categorical structural optimization : methods and applications." Thesis, Compiègne, 2019. http://www.theses.fr/2019COMP2471/document.

Full text
Abstract:
La thèse se concentre sur une recherche méthodologique sur l'optimisation structurelle catégorielle au moyen d'un apprentissage multiple. Dans cette thèse, les variables catégorielles non ordinales sont traitées comme des variables discrètes multidimensionnelles. Afin de réduire la dimensionnalité, les nombreuses techniques d'apprentissage sont introduites pour trouver la dimensionnalité intrinsèque et mapper l'espace de conception d'origine sur un espace d'ordre réduit. Les mécanismes des techniques d'apprentissage à la fois linéaires et non linéaires sont d'abord étudiés. Ensuite, des exemples numériques sont testés pour comparer les performances de nombreuses techniques d’apprentissage. Sur la base de la représentation d'ordre réduit obtenue par Isomap, les opérateurs de mutation et de croisement évolutifs basés sur les graphes sont proposés pour traiter des problèmes d'optimisation structurelle catégoriels, notamment la conception du dôme, du cadre rigide de six étages et des structures en forme de dame. Ensuite, la méthode de recherche continue consistant à déplacer des asymptotes est exécutée et fournit une solution compétitive, mais inadmissible, en quelques rares itérations. Ensuite, lors de la deuxième étape, une stratégie de recherche discrète est proposée pour rechercher de meilleures solutions basées sur la recherche de voisins. Afin de traiter le cas dans lequel les instances de conception catégorielles sont réparties sur plusieurs variétés, nous proposons une méthode d'apprentissage des variétés k-variétés basée sur l'analyse en composantes principales pondérées
The thesis concentrates on a methodological research on categorical structural optimizationby means of manifold learning. The main difficulty of handling the categorical optimization problems lies in the description of the categorical variables: they are presented in a category and do not have any orders. Thus the treatment of the design space is a key issue. In this thesis, the non-ordinal categorical variables are treated as multi-dimensional discrete variables, thus the dimensionality of corresponding design space becomes high. In order to reduce the dimensionality, the manifold learning techniques are introduced to find the intrinsic dimensionality and map the original design space to a reduced-order space. The mechanisms of both linear and non-linear manifold learning techniques are firstly studied. Then numerical examples are tested to compare the performance of manifold learning techniques mentioned above. It is found that the PCA and MDS can only deal with linear or globally approximately linear cases. Isomap preserves the geodesic distances for non-linear manifold however, its time consuming is the most. LLE preserves the neighbour weights and can yield good results in a short time. KPCA works like a non-linear classifier and we proves why it cannot preserve distances or angles in some cases. Based on the reduced-order representation obtained by Isomap, the graph-based evolutionary crossover and mutation operators are proposed to deal with categorical structural optimization problems, including the design of dome, six-story rigid frame and dame-like structures. The results show that the proposed graph-based evolutionary approach constructed on the reduced-order space performs more efficiently than traditional methods including simplex approach or evolutionary approach without reduced-order space. In chapter 5, the LLE is applied to reduce the data dimensionality and a polynomial interpolation helps to construct the responding surface from lower dimensional representation to original data. Then the continuous search method of moving asymptotes is executed and yields a competitively good but inadmissible solution within only a few of iteration numbers. Then in the second stage, a discrete search strategy is proposed to find out better solutions based on a neighbour search. The ten-bar truss and dome structural design problems are tested to show the validity of the method. In the end, this method is compared to the Simulated Annealing algorithm and Covariance Matrix Adaptation Evolutionary Strategy, showing its better optimization efficiency. In chapter 6, in order to deal with the case in which the categorical design instances are distributed on several manifolds, we propose a k-manifolds learning method based on the Weighted Principal Component Analysis. And the obtained manifolds are integrated in the lower dimensional design space. Then the method introduced in chapter 4 is applied to solve the ten-bar truss, the dome and the dame-like structural design problems
APA, Harvard, Vancouver, ISO, and other styles
48

Sellami, Akrem. "Interprétation sémantique d'images hyperspectrales basée sur la réduction adaptative de dimensionnalité." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0037/document.

Full text
Abstract:
L'imagerie hyperspectrale permet d'acquérir des informations spectrales riches d'une scène dans plusieurs centaines, voire milliers de bandes spectrales étroites et contiguës. Cependant, avec le nombre élevé de bandes spectrales, la forte corrélation inter-bandes spectrales et la redondance de l'information spectro-spatiale, l'interprétation de ces données hyperspectrales massives est l'un des défis majeurs pour la communauté scientifique de la télédétection. Dans ce contexte, le grand défi posé est la réduction du nombre de bandes spectrales inutiles, c'est-à-dire de réduire la redondance et la forte corrélation de bandes spectrales tout en préservant l'information pertinente. Par conséquent, des approches de projection visent à transformer les données hyperspectrales dans un sous-espace réduit en combinant toutes les bandes spectrales originales. En outre, des approches de sélection de bandes tentent à chercher un sous-ensemble de bandes spectrales pertinentes. Dans cette thèse, nous nous intéressons d'abord à la classification d'imagerie hyperspectrale en essayant d'intégrer l'information spectro-spatiale dans la réduction de dimensions pour améliorer la performance de la classification et s'affranchir de la perte de l'information spatiale dans les approches de projection. De ce fait, nous proposons un modèle hybride permettant de préserver l'information spectro-spatiale en exploitant les tenseurs dans l'approche de projection préservant la localité (TLPP) et d'utiliser l'approche de sélection non supervisée de bandes spectrales discriminantes à base de contraintes (CBS). Pour modéliser l'incertitude et l'imperfection entachant ces approches de réduction et les classifieurs, nous proposons une approche évidentielle basée sur la théorie de Dempster-Shafer (DST). Dans un second temps, nous essayons d'étendre le modèle hybride en exploitant des connaissances sémantiques extraites à travers les caractéristiques obtenues par l'approche proposée auparavant TLPP pour enrichir la sélection non supervisée CBS. En effet, l'approche proposée permet de sélectionner des bandes spectrales pertinentes qui sont à la fois informatives, discriminantes, distinctives et peu redondantes. En outre, cette approche sélectionne les bandes discriminantes et distinctives en utilisant la technique de CBS en injectant la sémantique extraite par les techniques d'extraction de connaissances afin de sélectionner d'une manière automatique et adaptative le sous-ensemble optimal de bandes spectrales pertinentes. La performance de notre approche est évaluée en utilisant plusieurs jeux des données hyperspectrales réelles
Hyperspectral imagery allows to acquire a rich spectral information of a scene in several hundred or even thousands of narrow and contiguous spectral bands. However, with the high number of spectral bands, the strong inter-bands spectral correlation and the redundancy of spectro-spatial information, the interpretation of these massive hyperspectral data is one of the major challenges for the remote sensing scientific community. In this context, the major challenge is to reduce the number of unnecessary spectral bands, that is, to reduce the redundancy and high correlation of spectral bands while preserving the relevant information. Therefore, projection approaches aim to transform the hyperspectral data into a reduced subspace by combining all original spectral bands. In addition, band selection approaches attempt to find a subset of relevant spectral bands. In this thesis, firstly we focus on hyperspectral images classification attempting to integrate the spectro-spatial information into dimension reduction in order to improve the classification performance and to overcome the loss of spatial information in projection approaches.Therefore, we propose a hybrid model to preserve the spectro-spatial information exploiting the tensor model in the locality preserving projection approach (TLPP) and to use the constraint band selection (CBS) as unsupervised approach to select the discriminant spectral bands. To model the uncertainty and imperfection of these reduction approaches and classifiers, we propose an evidential approach based on the Dempster-Shafer Theory (DST). In the second step, we try to extend the hybrid model by exploiting the semantic knowledge extracted through the features obtained by the previously proposed approach TLPP to enrich the CBS technique. Indeed, the proposed approach makes it possible to select a relevant spectral bands which are at the same time informative, discriminant, distinctive and not very redundant. In fact, this approach selects the discriminant and distinctive spectral bands using the CBS technique injecting the extracted rules obtained with knowledge extraction techniques to automatically and adaptively select the optimal subset of relevant spectral bands. The performance of our approach is evaluated using several real hyperspectral data
APA, Harvard, Vancouver, ISO, and other styles
49

"Multi-Label Dimensionality Reduction." Doctoral diss., 2011. http://hdl.handle.net/2286/R.I.9454.

Full text
Abstract:
abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.
Dissertation/Thesis
Ph.D. Computer Science 2011
APA, Harvard, Vancouver, ISO, and other styles
50

Kim, Min-Young. "Discriminative models and dimensionality reduction for regression." 2008. http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.17339.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography